General Computing
linux debian filesystems freeze disk-encryption
Updated Thu, 01 Sep 2022 00:49:09 GMT

Debian/testing Linux on an encrypted LVM - filesystem intensive operations cause system to freeze and crash system-journald service


I use Debian/testing Linux on my multiple machines. For security reasons, I always install it on an encrypted LVM. Typically I use the ext4 filesystems with size 1TB to 3TB.

Unfortunately, it creates certain very nasty side-effects.

When I perform a filesystem-intensive operations (e.g. compiling of a few students' Buildroot projects in parallel, archiving 20 GB of data in tar.xz format, or installation of a software like Xilinx Vivado, which writes ca. 130 GB to the filesystem), the system freezes periodically for ca. 2 minutes. When I run dmesg, I get the following annoying messages:

[ 8648.672075] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 8648.672083] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 8648.672140] systemd[1]: systemd-journald.service: Unit process 7708 (systemd-journal) remains running after unit stopped.
[ 8648.672299] systemd[1]: Failed to start Journal Service.
[ 8648.672679] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 14.
[ 8648.672911] systemd[1]: Stopped Journal Service.
[ 8648.672980] systemd[1]: systemd-journald.service: Found left-over process 7708 (systemd-journal) in control group while starting unit. Ignoring.
[ 8648.672983] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 8648.673699] systemd[1]: Starting Journal Service...
[ 8738.922289] systemd[1]: systemd-journald.service: start operation timed out. Terminating.
[ 8828.923063] systemd[1]: systemd-journald.service: State 'stop-sigterm' timed out. Killing.
[ 8828.923105] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 8828.923141] systemd[1]: systemd-journald.service: Killing process 7708 (systemd-journal) with signal SIGKILL.
[ 8919.173428] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
[ 9009.423787] systemd[1]: systemd-journald.service: State 'final-sigterm' timed out. Killing.
[ 9009.423831] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 9099.674142] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 9099.674173] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 9099.674241] systemd[1]: systemd-journald.service: Unit process 7854 (systemd-journal) remains running after unit stopped.
[ 9099.674477] systemd[1]: Failed to start Journal Service.
[ 9099.674924] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 15.
[ 9099.675102] systemd[1]: Stopped Journal Service.
[ 9099.675185] systemd[1]: systemd-journald.service: Found left-over process 7854 (systemd-journal) in control group while starting unit. Ignoring.
[ 9099.675209] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 9099.675958] systemd[1]: Starting Journal Service...

The interesting thing is that the problem occurres more often on machines with bigger RAM (I use machines with the following RAM capacities: 16GB (Intel i7 CPU), 32GB (Intel i7 CPU), or 64GB (Intel Xeon E-2176M CPU)).

I have found the old similar problem reports like:

Based on the above, I have checked that I use the deadline scheduler:

# cat /sys/block/sda/queue/scheduler
[mq-deadline] none

I also use quite new kernel:

# uname -a
Linux WZabHP 5.10.0-8-amd64 #1 SMP Debian 5.10.46-2 (2021-07-20) x86_64 GNU/Linux

I have even tried to switch to the RT kernel in one of those machines (32 GB, i7 CPU), but it didn't help. Even worse - the freezes resultes in filesystem corruption (the effect repeated twice, after that I stopped trying to use the RT kernel).

What can be the cause of the described problem? Which settings should I check/adjust to cure it?

Update

I have found yet another set of posts suggesting that the parameters of the virtual memory may be associated with the described problem:

However, it seems that tuning the vm.dirty_background_ratio, and vm.dirty_ratio requires analysis of the speed of the disk.

Update 2 Before the problem with dying journald occures, the following appears in the kernel log (these are the first timeout-related errors):

Jul 26 20:44:38 WZabHP kernel: [  484.449978] INFO: task StreamTrans #6:2335 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [  484.449981]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [  484.449981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [  484.449982] task:StreamTrans #6  state:D stack:    0 pid: 2335 ppid:  1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [  484.449985] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [  484.449989]  __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [  484.449991]  schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [  484.449997]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [  484.450000]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [  484.450013]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [  484.450016]  __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [  484.450017]  do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [  484.450019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [  484.450021] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450022] RSP: 002b:00007fd2b8d528c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [  484.450023] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450024] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:44:38 WZabHP kernel: [  484.450025] RBP: 00007fd2af506ba0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:44:38 WZabHP kernel: [  484.450025] R10: 000200050000002a R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [  484.450026] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:44:38 WZabHP kernel: [  484.450030] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [  484.450031]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [  484.450031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [  484.450032] task:mozStorage #2   state:D stack:    0 pid: 2430 ppid:  1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [  484.450033] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [  484.450034]  __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [  484.450036]  schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [  484.450039]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [  484.450040]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [  484.450047]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [  484.450049]  __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [  484.450050]  do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [  484.450051]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [  484.450052] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450053] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [  484.450054] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [  484.450055] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000073
Jul 26 20:44:38 WZabHP kernel: [  484.450055] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [  484.450056] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f7
Jul 26 20:44:38 WZabHP kernel: [  484.450057] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e
Jul 26 20:48:40 WZabHP kernel: [  726.109957] INFO: task jbd2/dm-2-8:591 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.109959]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.109960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.109961] task:jbd2/dm-2-8     state:D stack:    0 pid:  591 ppid:     2 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.109963] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.109967]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.109969]  ? out_of_line_wait_on_bit_lock+0xb0/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109970]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109971]  io_schedule+0x42/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.109973]  bit_wait_io+0xd/0x50
Jul 26 20:48:40 WZabHP kernel: [  726.109974]  __wait_on_bit+0x2a/0x90
Jul 26 20:48:40 WZabHP kernel: [  726.109975]  out_of_line_wait_on_bit+0x92/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.109977]  ? var_wake_function+0x20/0x20
Jul 26 20:48:40 WZabHP kernel: [  726.109982]  jbd2_journal_commit_transaction+0x16b3/0x1ad0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109987]  kjournald2+0xab/0x270 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109988]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.109991]  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.109993]  kthread+0x11b/0x140
Jul 26 20:48:40 WZabHP kernel: [  726.109994]  ? __kthread_bind_mask+0x60/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.109996]  ret_from_fork+0x22/0x30
Jul 26 20:48:40 WZabHP kernel: [  726.109999] INFO: task journal-offline:2579 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.109999]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110001] task:journal-offline state:D stack:    0 pid: 2579 ppid:     1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [  726.110002] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110003]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110004]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110007]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110009]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110021]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110024]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110025]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110029] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110030] RSP: 002b:00007f68be229cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110031] RAX: ffffffffffffffda RBX: 000056427bbd8870 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110032] RDX: 00007f68c5a2b000 RSI: 00007f68c7649414 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [  726.110032] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68be22a700
Jul 26 20:48:40 WZabHP kernel: [  726.110033] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [  726.110034] R13: 00007ffe691a51df R14: 00007f68be229e00 R15: 000056427bbf9920
Jul 26 20:48:40 WZabHP kernel: [  726.110035] INFO: task journal-offline:2580 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110036]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110037] task:journal-offline state:D stack:    0 pid: 2580 ppid:     1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [  726.110038] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110039]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110040]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110043]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110045]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110051]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110053]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110054]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110056] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110057] RSP: 002b:00007f68bda28cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110058] RAX: ffffffffffffffda RBX: 000056427bc095d0 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [  726.110059] RDX: 00007f68c622b000 RSI: 00007f68c7649414 RDI: 0000000000000026
Jul 26 20:48:40 WZabHP kernel: [  726.110059] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68bda29700
Jul 26 20:48:40 WZabHP kernel: [  726.110060] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [  726.110061] R13: 00007ffe691a51df R14: 00007f68bda28e00 R15: 0000000000802000
Jul 26 20:48:40 WZabHP kernel: [  726.110064] INFO: task NetworkManager:1090 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110065]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110066] task:NetworkManager  state:D stack:    0 pid: 1090 ppid:     1 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110067] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110068]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110069]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110072]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110073]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110080]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110081]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110083]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110084]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110085] RIP: 0033:0x7fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [  726.110085] RSP: 002b:00007ffedee571c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110086] RAX: ffffffffffffffda RBX: 00007fd15cd06928 RCX: 00007fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [  726.110087] RDX: 0000000000000184 RSI: 000055c28ee17560 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [  726.110088] RBP: 000000000000001a R08: 0000000000000000 R09: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [  726.110088] R10: 0000000000000184 R11: 0000000000000293 R12: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [  726.110089] R13: 000055c28ed78fd0 R14: 000055c28ee176e4 R15: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110104] INFO: task Permission:2299 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110105]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110106] task:Permission      state:D stack:    0 pid: 2299 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110107] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110108]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110109]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110112]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110113]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110120]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110122]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110123]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110124]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110125] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110125] RSP: 002b:00007fd2beffe3e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110126] RAX: ffffffffffffffda RBX: 00007fd2bf0582e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110127] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000089
Jul 26 20:48:40 WZabHP kernel: [  726.110128] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2cec47af0
Jul 26 20:48:40 WZabHP kernel: [  726.110128] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [  726.110129] R13: 0000000000000000 R14: 0000000000010400 R15: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [  726.110131] INFO: task StreamTrans #3:2319 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110132]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110133] task:StreamTrans #3  state:D stack:    0 pid: 2319 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110134] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110135]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110136]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110139]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110140]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110146]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110148]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110149]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110150]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110151] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110152] RSP: 002b:00007fd2bba8e8c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110153] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110153] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [  726.110154] RBP: 00007fd2aecd16d0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:48:40 WZabHP kernel: [  726.110155] R10: 0002000500000049 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110155] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:48:40 WZabHP kernel: [  726.110157] INFO: task QuotaManager IO:2333 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110158]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110159] task:QuotaManager IO state:D stack:    0 pid: 2333 ppid:  1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [  726.110160] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110161]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110162]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110165]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110166]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110172]  ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110173]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110175]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110176]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110177] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110177] RSP: 002b:00007fd2b8d93020 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110178] RAX: ffffffffffffffda RBX: 00007fd2ba6f52e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110179] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000047
Jul 26 20:48:40 WZabHP kernel: [  726.110179] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2b8c1b910
Jul 26 20:48:40 WZabHP kernel: [  726.110180] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [  726.110180] R13: 0000000000000000 R14: 0000000000000a00 R15: 000000000000003b
Jul 26 20:48:40 WZabHP kernel: [  726.110184] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [  726.110185]       Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [  726.110185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [  726.110186] task:mozStorage #2   state:D stack:    0 pid: 2430 ppid:  1945 flags:0x00000000
Jul 26 20:48:40 WZabHP kernel: [  726.110187] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [  726.110188]  __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [  726.110189]  schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [  726.110192]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110193]  ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [  726.110196]  __jbd2_journal_force_commit+0x5d/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110199]  jbd2_journal_force_commit+0x1d/0x30 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [  726.110205]  ext4_sync_file+0x2c4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [  726.110206]  __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [  726.110207]  do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [  726.110209]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [  726.110209] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110210] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [  726.110211] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [  726.110212] RDX: 00000000000a0000 RSI: 00007fd2b334c0f0 RDI: 0000000000000072
Jul 26 20:48:40 WZabHP kernel: [  726.110212] RBP: 0000000000000002 R08: 0000000000000000 R09: 000000000000003e
Jul 26 20:48:40 WZabHP kernel: [  726.110213] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [  726.110213] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e

So the problem is related with to loow synchronization of the filesystem. The disk throughput is sufficient. However, the problem appears in installations with encrypted volumes. The CPU performance (especially in case of the Xeon machine) is more than sufficient to perform encryption quickly enough.

Another scarce resource needed for encryption is the true random data. Indeed the system that I can access now appeared to have neither rng-tools5 nor rng-tools-debian installed. That's probable source of the problem.




Solution

I have found the similar problem described in https://askubuntu.com/questions/1406444/what-is-causing-my-system-to-stall-freeze-corrupt-data-when-using-lvm-luks .

One of modifications described here has eliminated freezing the system at massive writes.

cryptsetup  --perf-no_write_workqueue refresh name_of_the_mapping

(in my case it was: cryptsetup --perf-no_write_workqueue refresh sda6_crypt )

The option may be set permanently by doing:

cryptsetup  --perf-no_write_workqueue --persistent refresh name_of_the_mapping

I have reported the problem as a bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016474 , but it is questionable whether this is a bug in the cryptsetup package.

So the full solution seems to be:

  1. Installation of the haveged package
  2. Using the --perf-no_write_workqueue option in mapping of the encrypted LVM.