debcrafters-packages team mailing list archive
-
debcrafters-packages team
-
Mailing list archive
-
Message #08587
[Bug 2062568] Re: nfsd gets unresponsive after some hours of operation
I got pratically identical client hang and stack trace with comment #35:
INFO: task apache2:2566 blocked for more than 122 seconds.
Not tainted 6.14.0-33-generic #33~24.04.1-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:apache2 state:D stack:0 pid:2566 tgid:2566 ppid:2271 task_flags:0x400140 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x2cf/0x640
schedule+0x29/0xd0
io_schedule+0x4c/0x80
folio_wait_bit_common+0x138/0x310
? __pfx_wake_page_function+0x10/0x10
folio_wait_private_2+0x2c/0x60
nfs_invalidate_folio+0x84/0x110 [nfs]
truncate_cleanup_folio+0xaa/0xd0
truncate_inode_pages_range+0x140/0x560
? __call_rcu_nocb_wake+0x17d/0x270
truncate_pagecache+0x48/0x70
nfs_setattr_update_inode+0x30e/0x3d0 [nfs]
nfs3_proc_setattr+0x108/0x150 [nfsv3]
nfs_setattr+0x197/0x380 [nfs]
notify_change+0x2fa/0x4f0
do_truncate+0x98/0xf0
? do_truncate+0x98/0xf0
do_open+0x2f0/0x430
path_openat+0x134/0x2d0
do_filp_open+0xd4/0x1a0
do_sys_openat2+0xb3/0xe0
? post_alloc_hook+0xc9/0x140
__x64_sys_openat+0x55/0xa0
x64_sys_call+0x1c49/0x2650
do_syscall_64+0x7e/0x170
? __alloc_frozen_pages_noprof+0x164/0x330
? try_charge_memcg+0x8e/0x5a0
? __mod_memcg_lruvec_state+0xf4/0x250
? __lruvec_stat_mod_folio+0x8b/0xf0
? set_ptes.isra.0+0x3b/0x90
? do_anonymous_page+0x132/0x470
? handle_pte_fault+0x1e1/0x200
? __handle_mm_fault+0x62c/0x770
? __count_memcg_events+0xd3/0x1a0
? count_memcg_events.constprop.0+0x2a/0x50
? handle_mm_fault+0x1df/0x2d0
? do_user_addr_fault+0x5d5/0x870
? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
? irqentry_exit_to_user_mode+0x2d/0x1d0
? irqentry_exit+0x43/0x50
? exc_page_fault+0x96/0x1e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7be9b631b215
RSP: 002b:00007ffd47a5a600 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000241 RCX: 00007be9b631b215
RDX: 0000000000000241 RSI: 00007ffd47a5a6b0 RDI: 00000000ffffff9c
RBP: 00007ffd47a5a670 R08: 0000000000000000 R09: 000000000000006e
R10: 00000000000001b6 R11: 0000000000000293 R12: 00007ffd47a5a6b0
R13: 00007ffd47a5a6b0 R14: 00007ffd47a5b7c5 R15: 0000000000000000
</TASK>
This happened while Apache / mod_php process tried to access file on NFS
mount. The mount used following flags:
nofail,nfsvers=3,fsc,tcp,intr,soft,retrans=3,timeo=10,retry=1,ac,lookupcache=positive,acregmin=1,acdirmin=1,noexec,nosuid,noatime
but the hang still lasted for over 10 hours until the client was hard-
rebooted with "echo b > /proc/sysrq-trigger".
The mount was done using redundant 10 Gbps fiber link and it's still
unclear how much traffic was going at the moment of the hang starting.
The system was using following kernel packages at the moment of the
hang:
linux-lowlatency-hwe-24.04
linux-image-generic-hwe-24.04
linux-image-6.14.0-33-generic
linux-modules-6.14.0-33-generic
linux-modules-extra-6.14.0-33-generic
and the exact version for all of the above was 6.14.0-33.33~24.04.1
Other clients with identical connection and the same NFS server seemed
to work fine at the same moment so this is probably some kind of race
condition in the kernel. And looking at this discussion, I'd guess
everything since 5.x kernels has this yet-unknown failure mode. The
system was previously using 5.15.x kernels for a long time without any
issues.
--
You received this bug notification because you are a member of
Debcrafters packages, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/2062568
Title:
nfsd gets unresponsive after some hours of operation
Status in linux package in Ubuntu:
In Progress
Status in nfs-utils package in Ubuntu:
Incomplete
Status in linux source package in Noble:
In Progress
Status in nfs-utils source package in Noble:
Incomplete
Bug description:
I installed the 24.04 Beta on two test machines that were running
22.04 without issues before. One of them exports two volumes that are
mounted by the other machine, which primarily uses them as a secondary
storage for ccache.
After being up for a couple of hours (happened twice since yesterday
evening) it seems that nfsd on the machine exporting the volumes hangs
on something.
From dmesg on the server (repeated a few times):
[11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds.
[11183.290558] Not tainted 6.8.0-22-generic #22-Ubuntu
[11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11183.290582] task:nfsd state:D stack:0 pid:1419 tgid:1419 ppid:2 flags:0x00004000
[11183.290587] Call Trace:
[11183.290602] <TASK>
[11183.290606] __schedule+0x27c/0x6b0
[11183.290612] schedule+0x33/0x110
[11183.290615] schedule_timeout+0x157/0x170
[11183.290619] wait_for_completion+0x88/0x150
[11183.290623] __flush_workqueue+0x140/0x3e0
[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
[11183.290744] nfsd4_proc_compound+0x3af/0x770 [nfsd]
[11183.290798] nfsd_dispatch+0xd4/0x220 [nfsd]
[11183.290851] svc_process_common+0x44d/0x710 [sunrpc]
[11183.290924] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[11183.290976] svc_process+0x132/0x1b0 [sunrpc]
[11183.291041] svc_handle_xprt+0x4d3/0x5d0 [sunrpc]
[11183.291105] svc_recv+0x18b/0x2e0 [sunrpc]
[11183.291168] ? __pfx_nfsd+0x10/0x10 [nfsd]
[11183.291220] nfsd+0x8b/0xe0 [nfsd]
[11183.291270] kthread+0xef/0x120
[11183.291274] ? __pfx_kthread+0x10/0x10
[11183.291276] ret_from_fork+0x44/0x70
[11183.291279] ? __pfx_kthread+0x10/0x10
[11183.291281] ret_from_fork_asm+0x1b/0x30
[11183.291286] </TASK>
From dmesg on the client (repeated a number of times):
[ 6596.911785] RPC: Could not send backchannel reply error: -110
[ 6596.972490] RPC: Could not send backchannel reply error: -110
[ 6837.281307] RPC: Could not send backchannel reply error: -110
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: nfs-kernel-server 1:2.6.4-3ubuntu5
ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
Uname: Linux 6.8.0-22-generic x86_64
.etc.request-key.d.id_resolver.conf: create id_resolver * * /usr/sbin/nfsidmap -t 600 %k %d
ApportVersion: 2.28.1-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
Date: Fri Apr 19 14:10:25 2024
InstallationDate: Installed on 2024-04-16 (3 days ago)
InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 (20240410.1)
NFSMounts:
NFSv4Mounts:
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
XDG_RUNTIME_DIR=<set>
SourcePackage: nfs-utils
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2062568/+subscriptions