-current 100% CPU, softdep related

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

-current 100% CPU, softdep related

Mark Patruck
Hi,

i've stumbled upon an issue with the following change to -current
about 8 weeks ago:

/sys/ufs/ffs/ffs_softdep.c r1.146

https://marc.info/?l=openbsd-cvs&m=157815505920104&w=2

All systems (softdep enabled) were running -current dated 23 Dec
2019 and after updating to -current 13 Jan 2020, one system hosting
a fairly busy postgres database became unresponsive and started to
eat up 100% cpu...about 30m - 6h after the last boot.

The system runs as VM on ESXi 6.7u3 (AMD Epyc gen1) and unfortunately...
even after several weeks of testing it didn't crash a single time, so
no ddb, no entries in /var/log...the system is simply "busy" and won't
react to anything (hypervisor monitoring shows that as soon as the
system becomes unresponsive, cpu utilization goes up to 100%). I've
tested with another VM running on ESXi 6.7u3 (AMD Epyc gen2)...same
results.

In order to avoid any hardware related issue, i've setup the exact
same OpenBSD-current + configuration on a bare metal Intel platform
(Supermicro X10SLM-f Xeon E3-1230v3) with a serial cable connected.
A few hours after boot it showed the identical error. (i did several
tests over two weeks). Unfortunately,
even with

ddb.panic=1
ddb.console=1

and sending a BREAK with #~ (cu -l /dev/cuaU0 -115200) i don't get
any helpful info (no trace or ps).

~#
The following connections are open:
   #1 client-session (t4 r0 i0/0 o0/0 e[write]/0 fd 5/6/7 sock -1 cc -1)
   #2 mux-control (t16 r3 i0/0 o0/0 e[closed]/0 fd 8/8/-1 sock 8 cc -1)
   #3 client-session (t4 r1 i0/0 o0/0 e[write]/0 fd 9/10/11 sock -1 cc 2)
   #4 mux-control (t16 r5 i0/0 o0/0 e[closed]/0 fd 12/12/-1 sock 12 cc -1)
   #5 client-session (t4 r2 i0/0 o0/0 e[write]/4 fd 13/14/15 sock -1 cc 4)

It won't react to commands.

So as mentioned in the beginning, when running a fresh -current
kernel but w/o the change in /sys/ufs/ffs/ffs_softdep.c r1.146, the
system is stable, even after weeks...and...the same applies for an
unmodified -current kernel, but with mounting /var/postgresql w/o
softdep.

Thanks in advance for any info - especially to get more debugging
output.

     -Mark

--
Mark Patruck ( mark at wrapped.cx )
GPG key 0xF2865E51 / 187F F6D3 EE04 1DCE 1C74  F644 0D3C F66F F286 5E51

https://www.wrapped.cx

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Todd C. Miller-3
This sounds like the loop in softdep_process_worklist() is never
exiting.  It shouldn't run for more than a second, though.

FreeBSD breaks out of the loop if process_worklist_item() can't
make progress.  You could try the following (untested) diff to see
if it changes the behavior.

 - todd

Index: /sys/ufs/ffs/ffs_softdep.c
===================================================================
RCS file: /cvs/src/sys/ufs/ffs/ffs_softdep.c,v
retrieving revision 1.148
diff -u -p -u -r1.148 ffs_softdep.c
--- /sys/ufs/ffs/ffs_softdep.c 4 Feb 2020 04:09:11 -0000 1.148
+++ /sys/ufs/ffs/ffs_softdep.c 28 Feb 2020 20:42:39 -0000
@@ -591,7 +591,7 @@ int
 softdep_process_worklist(struct mount *matchmnt)
 {
  struct proc *p = CURPROC;
- int matchcnt, loopcount;
+ int cnt, matchcnt, loopcount;
  struct timeval starttime;
 
  /*
@@ -639,7 +639,9 @@ softdep_process_worklist(struct mount *m
  loopcount = 1;
  getmicrouptime(&starttime);
  while (num_on_worklist > 0) {
- matchcnt += process_worklist_item(matchmnt, LK_NOWAIT);
+ if ((cnt = process_worklist_item(matchmnt, LK_NOWAIT)) == 0)
+ break;
+ matchcnt += cnt;
 
  /*
  * If a umount operation wants to run the worklist

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Mark Patruck
On 2020-02-28 21:57, Todd C. Miller wrote:
> This sounds like the loop in softdep_process_worklist() is never
> exiting.  It shouldn't run for more than a second, though.
>
> FreeBSD breaks out of the loop if process_worklist_item() can't
> make progress.  You could try the following (untested) diff to see
> if it changes the behavior.

After ~11h with your diff, the system was still up and running,
so i decided to reboot. It crashed while syncing disks.


syncing disks...uvm_fault(0xfffffd83a134d668, 0x20, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      handle_workitem_freefile+0x2a:  movq    0x20(%rax),%rcx
ddb{0}>

ddb{0}> trace
handle_workitem_freefile(fffffd8369e3be70) at handle_workitem_freefile+0x2a
process_worklist_item(ffff8000001b8800,40) at process_worklist_item+0x1c9
softdep_process_worklist(ffff8000001b8800) at softdep_process_worklist+0xed
softdep_flushworklist(ffff8000001b8800,ffff800022516218,ffff800022556d58) at so
ftdep_flushworklist+0xb8
ffs_sync(ffff8000001b8800,1,0,fffffd841f7c2480,ffff800022556d58) at ffs_sync+0xdd
dounmount_leaf(ffff8000001b8800,80000,ffff800022556d58) at dounmount_leaf+0xaa
dounmount(ffff8000001b8800,80000,ffff800022556d58) at dounmount+0xfc
vfs_unmountall() at vfs_unmountall+0x8e
vfs_shutdown(ffff800022556d58) at vfs_shutdown+0x3b
boot(0) at boot+0x6c
reboot(0) at reboot+0x5c
sys_reboot(ffff800022556d58,ffff800022516470,ffff8000225164d0) at sys_reboot+0x7e
syscall(ffff800022516540) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffee610, count: -14


ddb{0}> ps
    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
*47072  417851      1      0  7         0x3                reboot
  95768  197147      0      0  3     0x14200  bored         smr
  23039  211890      0      0  2     0x14200                zerothread
  35233   64895      0      0  3     0x14200  aiodoned      aiodoned
  71347  316690      0      0  3     0x14200  syncer        update
  44681  504129      0      0  3     0x14200  cleaner       cleaner
  90819  193944      0      0  3     0x14200  reaper        reaper
  96855  230075      0      0  3     0x14200  pgdaemon      pagedaemon
  47917   12510      0      0  3     0x14200  bored         crynlk
  20365   68713      0      0  3     0x14200  bored         crypto
  21791  476177      0      0  3     0x14200  usbtsk        usbtask
  70170  396785      0      0  3     0x14200  usbatsk       usbatsk
  44347   23905      0      0  3  0x40014200  acpi0         acpi0
  27912   30285      0      0  7  0x40014200                idle7
  22700  306163      0      0  7  0x40014200                idle6
  63705  100085      0      0  7  0x40014200                idle5
  17377  411898      0      0  7  0x40014200                idle4
  84659  510633      0      0  7  0x40014200                idle3
  54830  118160      0      0  7  0x40014200                idle2
  54818  330925      0      0  7  0x40014200                idle1
  70767  292562      0      0  3     0x14200  bored         sensors
  44681  504129      0      0  3     0x14200  cleaner       cleaner
  90819  193944      0      0  3     0x14200  reaper        reaper
  96855  230075      0      0  3     0x14200  pgdaemon      pagedaemon
  47917   12510      0      0  3     0x14200  bored         crynlk
  20365   68713      0      0  3     0x14200  bored         crypto
  21791  476177      0      0  3     0x14200  usbtsk        usbtask
  70170  396785      0      0  3     0x14200  usbatsk       usbatsk
  44347   23905      0      0  3  0x40014200  acpi0         acpi0
  27912   30285      0      0  7  0x40014200                idle7
  22700  306163      0      0  7  0x40014200                idle6
  63705  100085      0      0  7  0x40014200                idle5
  17377  411898      0      0  7  0x40014200                idle4
  84659  510633      0      0  7  0x40014200                idle3
  54830  118160      0      0  7  0x40014200                idle2
  54818  330925      0      0  7  0x40014200                idle1
  70767  292562      0      0  3     0x14200  bored         sensors
  59264  346295      0      0  3     0x14200  bored         softnet
  10461  133156      0      0  3     0x14200  bored         systqmp
   1863  323920      0      0  3     0x14200  bored         systq
  11411  181100      0      0  2  0x40014200                softclock
   3128  136983      0      0  3  0x40014200                idle0
      1   52990      0      0  3        0x82  wait          init
      0       0     -1      0  3     0x10200  scheduler     swapper


ddb{0}> show registers
rdi               0xfffffd8369e3be70
rsi                             0x40
rbp               0xffff800022515f50
rbx                              0x1
rdx               0xfe00000007ff1e3a
rcx                            0x202
rax                                0
r8                                 0
r9                0xffffffffffffffff
r10               0x208d459308eb8ce2
r11               0x572f46b22ffb6c93
r12                             0x40
r13                              0x1
r14               0xfffffd8369e3be70
r15               0xffff8000001b8800
rip               0xffffffff812fb35a    handle_workitem_freefile+0x2a
cs                               0x8
rflags                       0x10282    __ALIGN_SIZE+0xf282
rsp               0xffff800022515d70
ss                              0x10
handle_workitem_freefile+0x2a:  movq    0x20(%rax),%rcx


        -Mark

>   - todd
>
> Index: /sys/ufs/ffs/ffs_softdep.c
> ===================================================================
> RCS file: /cvs/src/sys/ufs/ffs/ffs_softdep.c,v
> retrieving revision 1.148
> diff -u -p -u -r1.148 ffs_softdep.c
> --- /sys/ufs/ffs/ffs_softdep.c 4 Feb 2020 04:09:11 -0000 1.148
> +++ /sys/ufs/ffs/ffs_softdep.c 28 Feb 2020 20:42:39 -0000
> @@ -591,7 +591,7 @@ int
>   softdep_process_worklist(struct mount *matchmnt)
>   {
>   struct proc *p = CURPROC;
> - int matchcnt, loopcount;
> + int cnt, matchcnt, loopcount;
>   struct timeval starttime;
>  
>   /*
> @@ -639,7 +639,9 @@ softdep_process_worklist(struct mount *m
>   loopcount = 1;
>   getmicrouptime(&starttime);
>   while (num_on_worklist > 0) {
> - matchcnt += process_worklist_item(matchmnt, LK_NOWAIT);
> + if ((cnt = process_worklist_item(matchmnt, LK_NOWAIT)) == 0)
> + break;
> + matchcnt += cnt;
>  
>   /*
>   * If a umount operation wants to run the worklist
>

--
Mark Patruck ( mark at wrapped.cx )
GPG key 0xF2865E51 / 187F F6D3 EE04 1DCE 1C74  F644 0D3C F66F F286 5E51

https://www.wrapped.cx

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Mark Patruck
After ~3 days with the system up and running, the crash after doing
a "reboot" looks different. Now it's in handle_workitem_freeblocks(),
according to objdump

/sys/ufs/ffs/ffs_softdep.c:2365
     120e:       48 8b 47 28             mov    0x28(%rdi),%rax
     1212:       48 8b 80 98 02 00 00    mov    0x298(%rax),%rax
--> 1219:       48 83 78 18 01          cmpq   $0x1,0x18(%rax)
     121e:       48 8d 8d b8 fd ff ff    lea    0xfffffffffffffdb8(%rbp),%rcx
     1225:       48 89 4d 98             mov    %rcx,0xffffffffffffff98(%rbp)


syncing disks...uvm_fault(0xfffffd83a1b78228, 0x18, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      handle_workitem_freeblocks+0x39:        cmpq    $0x1,0x18(%rax)


ddb{0}> trace
handle_workitem_freeblocks(fffffd82b482bce8) at handle_workitem_freeblocks+0x39
process_worklist_item(ffff8000001b8800,40) at process_worklist_item+0x1f2
softdep_process_worklist(ffff8000001b8800) at softdep_process_worklist+0xed
softdep_flushworklist(ffff8000001b8800,ffff8000225ca708,ffff8000226deb00) at so
ftdep_flushworklist+0xb8
ffs_sync(ffff8000001b8800,1,0,fffffd841f7c26c0,ffff8000226deb00) at ffs_sync+0xdd
dounmount_leaf(ffff8000001b8800,80000,ffff8000226deb00) at dounmount_leaf+0xaa
dounmount(ffff8000001b8800,80000,ffff8000226deb00) at dounmount+0xfc
vfs_unmountall() at vfs_unmountall+0x8e
vfs_shutdown(ffff8000226deb00) at vfs_shutdown+0x3b
boot(0) at boot+0x6c
reboot(0) at reboot+0x5c
sys_reboot(ffff8000226deb00,ffff8000225ca960,ffff8000225ca9c0) at sys_reboot+0x7e
syscall(ffff8000225caa30) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffdc410, count: -14


ddb{0}> ps
    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
*50794  484816      1      0  7         0x3                reboot
  19349  244035      0      0  3     0x14200  bored         smr
  91090  149880      0      0  2     0x14200                zerothread
  90933  308106      0      0  3     0x14200  aiodoned      aiodoned
   3559    3994      0      0  3     0x14200  syncer        update
  13412  220874      0      0  3     0x14200  cleaner       cleaner
  67192  445479      0      0  3     0x14200  reaper        reaper
  16869  348395      0      0  3     0x14200  pgdaemon      pagedaemon
  59176  404002      0      0  3     0x14200  bored         crynlk
  70887  377538      0      0  3     0x14200  bored         crypto
  59125  295509      0      0  3     0x14200  usbtsk        usbtask
  73888  331304      0      0  3     0x14200  usbatsk       usbatsk
  87760  317235      0      0  3  0x40014200  acpi0         acpi0
  32435  334346      0      0  7  0x40014200                idle7
  97709  444879      0      0  7  0x40014200                idle6
  85595  223792      0      0  7  0x40014200                idle5
  89353  449747      0      0  7  0x40014200                idle4
  89616   53984      0      0  7  0x40014200                idle3
  17444  112424      0      0  7  0x40014200                idle2
  50503  439810      0      0  7  0x40014200                idle1
  70433  274545      0      0  3     0x14200  bored         sensors
  58599   63095      0      0  3     0x14200  bored         softnet
   4228   56259      0      0  3     0x14200  bored         systqmp
  94385  233288      0      0  3     0x14200  bored         systq
   7159  306934      0      0  3  0x40014200  bored         softclock
  67585  249534      0      0  3  0x40014200                idle0
      1  208459      0      0  3        0x82  wait          init
      0       0     -1      0  3     0x10200  scheduler     swapper


ddb{0}> show registers
rdi               0xfffffd82b482bce8
rsi                             0x40
rbp               0xffff8000225ca440
rbx                              0x1
rdx               0xfe00000007ff1e3a
rcx                            0x286
rax                                0
r8                               0x8
r9                               0x1
r10               0xcbb4b964dc1cd4c3
r11               0xab8da676dd4aa070
r12                             0x40
r13                              0x9
r14               0xfffffd82b482bce8
r15               0xffff8000001b8800
rip               0xffffffff812b3589    handle_workitem_freeblocks+0x39
cs                               0x8
rflags                       0x10286    __ALIGN_SIZE+0xf286
rsp               0xffff8000225ca1f0
ss                              0x10
handle_workitem_freeblocks+0x39:        cmpq    $0x1,0x18(%rax)


On 2020-02-29 10:01, Mark Patruck wrote:

> On 2020-02-28 21:57, Todd C. Miller wrote:
>> This sounds like the loop in softdep_process_worklist() is never
>> exiting.  It shouldn't run for more than a second, though.
>>
>> FreeBSD breaks out of the loop if process_worklist_item() can't
>> make progress.  You could try the following (untested) diff to see
>> if it changes the behavior.
>
> After ~11h with your diff, the system was still up and running,
> so i decided to reboot. It crashed while syncing disks.
>
>
> syncing disks...uvm_fault(0xfffffd83a134d668, 0x20, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at      handle_workitem_freefile+0x2a:  movq    0x20(%rax),%rcx
> ddb{0}>
>
SNIP

--
Mark Patruck ( mark at wrapped.cx )
GPG key 0xF2865E51 / 187F F6D3 EE04 1DCE 1C74  F644 0D3C F66F F286 5E51

https://www.wrapped.cx

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Todd C. Miller-3
On Tue, 03 Mar 2020 10:20:04 +0100, Mark Patruck wrote:

> After ~3 days with the system up and running, the crash after doing
> a "reboot" looks different. Now it's in handle_workitem_freeblocks(),
> according to objdump

The problem is that removed files were not actually getting removed
since process_worklist_item() was returning 0 even when it performed
work, causing the loop to exit early.  You can see this by running
df after removing files--the available blocks are returned to the
system very slowly if at all.

I've been testing the following diff which seems to be stable,
though it contains a couple different changes which should probably
be split up.

1) Return the count in process_worklist_item() regardless of the
   value of matchmnt (which can be removed from the args).

2) Don't pass LK_NOWAIT to process_worklist_item() when unmounting
   the file system.

3) Simpler starttime check in softdep_process_worklist() that
   is consistent with the syncer.  The check is now skipped if we
   are unmounting.

 - todd

Index: sys/kern/vfs_sync.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_sync.c,v
retrieving revision 1.63
diff -u -p -u -r1.63 vfs_sync.c
--- sys/kern/vfs_sync.c 20 Jan 2020 23:21:56 -0000 1.63
+++ sys/kern/vfs_sync.c 3 Mar 2020 14:51:15 -0000
@@ -53,7 +53,7 @@
 #include <sys/sched.h>
 
 #ifdef FFS_SOFTUPDATES
-int   softdep_process_worklist(struct mount *);
+int   softdep_process_worklist(struct mount *, int);
 #endif
 
 /*
@@ -203,7 +203,7 @@ syncer_thread(void *arg)
  /*
  * Do soft update processing.
  */
- softdep_process_worklist(NULL);
+ softdep_process_worklist(NULL, 0);
 #endif
 
  /*
Index: sys/ufs/ffs/ffs_extern.h
===================================================================
RCS file: /cvs/src/sys/ufs/ffs/ffs_extern.h,v
retrieving revision 1.45
diff -u -p -u -r1.45 ffs_extern.h
--- sys/ufs/ffs/ffs_extern.h 20 Jan 2020 23:21:56 -0000 1.45
+++ sys/ufs/ffs/ffs_extern.h 3 Mar 2020 14:51:15 -0000
@@ -168,10 +168,10 @@ struct vop_vfree_args;
 struct vop_fsync_args;
 
 void  softdep_initialize(void);
-int   softdep_process_worklist(struct mount *);
+int   softdep_process_worklist(struct mount *, int);
 int   softdep_mount(struct vnode *, struct mount *, struct fs *,
           struct ucred *);
-int   softdep_flushworklist(struct mount *, int *, struct proc *);
+int   softdep_flushworklist(struct mount *, int *, struct proc *, int);
 int   softdep_flushfiles(struct mount *, int, struct proc *);
 void  softdep_update_inodeblock(struct inode *, struct buf *, int);
 void  softdep_load_inodeblock(struct inode *);
Index: sys/ufs/ffs/ffs_softdep.c
===================================================================
RCS file: /cvs/src/sys/ufs/ffs/ffs_softdep.c,v
retrieving revision 1.148
diff -u -p -u -r1.148 ffs_softdep.c
--- sys/ufs/ffs/ffs_softdep.c 4 Feb 2020 04:09:11 -0000 1.148
+++ sys/ufs/ffs/ffs_softdep.c 3 Mar 2020 14:51:15 -0000
@@ -163,7 +163,7 @@ STATIC int inodedep_lookup(struct fs *,
 STATIC int pagedep_lookup(struct inode *, daddr_t, int, struct pagedep **);
 STATIC void pause_timer(void *);
 STATIC int request_cleanup(int, int);
-STATIC int process_worklist_item(struct mount *, int);
+STATIC int process_worklist_item(int);
 STATIC void add_to_worklist(struct worklist *);
 
 /*
@@ -588,11 +588,11 @@ add_to_worklist(struct worklist *wk)
  * until all the old ones have been purged from the dependency lists.
  */
 int
-softdep_process_worklist(struct mount *matchmnt)
+softdep_process_worklist(struct mount *matchmnt, int full)
 {
  struct proc *p = CURPROC;
- int matchcnt, loopcount;
- struct timeval starttime;
+ int cnt, matchcnt, loopcount;
+ time_t starttime;
 
  /*
  * First process any items on the delayed-free queue.
@@ -637,9 +637,12 @@ softdep_process_worklist(struct mount *m
  wakeup_one(&proc_waiting);
  }
  loopcount = 1;
- getmicrouptime(&starttime);
+ starttime = time_second;
  while (num_on_worklist > 0) {
- matchcnt += process_worklist_item(matchmnt, LK_NOWAIT);
+ cnt = process_worklist_item(full ? 0 : LK_NOWAIT);
+ if (cnt == 0)
+ break;
+ matchcnt += cnt;
 
  /*
  * If a umount operation wants to run the worklist
@@ -676,17 +679,8 @@ softdep_process_worklist(struct mount *m
  * second. Otherwise the other syncer tasks may get
  * excessively backlogged.
  */
- {
- struct timeval diff;
- struct timeval tv;
-
- getmicrouptime(&tv);
- timersub(&tv, &starttime, &diff);
- if (diff.tv_sec != 0 && matchmnt == NULL) {
- matchcnt = -1;
- break;
- }
- }
+ if (!full && starttime != time_second)
+ break;
 
  /*
  * Process any new items on the delayed-free queue.
@@ -707,7 +701,7 @@ softdep_process_worklist(struct mount *m
  * Process one item on the worklist.
  */
 STATIC int
-process_worklist_item(struct mount *matchmnt, int flags)
+process_worklist_item(int flags)
 {
  struct worklist *wk, *wkend;
  struct dirrem *dirrem;
@@ -761,8 +755,7 @@ process_worklist_item(struct mount *matc
  panic("%s: dirrem on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ matchcnt += 1;
  handle_workitem_remove(WK_DIRREM(wk));
  break;
 
@@ -774,8 +767,7 @@ process_worklist_item(struct mount *matc
  panic("%s: freeblks on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ matchcnt += 1;
  handle_workitem_freeblocks(WK_FREEBLKS(wk));
  break;
 
@@ -787,8 +779,7 @@ process_worklist_item(struct mount *matc
  panic("%s: freefrag on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ matchcnt += 1;
  handle_workitem_freefrag(WK_FREEFRAG(wk));
  break;
 
@@ -800,8 +791,7 @@ process_worklist_item(struct mount *matc
  panic("%s: freefile on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ matchcnt += 1;
  handle_workitem_freefile(WK_FREEFILE(wk));
  break;
 
@@ -840,7 +830,7 @@ softdep_move_dependencies(struct buf *ol
  * Purge the work list of all items associated with a particular mount point.
  */
 int
-softdep_flushworklist(struct mount *oldmnt, int *countp, struct proc *p)
+softdep_flushworklist(struct mount *oldmnt, int *countp, struct proc *p, int full)
 {
  struct vnode *devvp;
  int count, error = 0;
@@ -862,7 +852,7 @@ softdep_flushworklist(struct mount *oldm
  */
  *countp = 0;
  devvp = VFSTOUFS(oldmnt)->um_devvp;
- while ((count = softdep_process_worklist(oldmnt)) > 0) {
+ while ((count = softdep_process_worklist(oldmnt, full)) > 0) {
  *countp += count;
  vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY);
  error = VOP_FSYNC(devvp, p->p_ucred, MNT_WAIT, p);
@@ -897,7 +887,7 @@ softdep_flushfiles(struct mount *oldmnt,
  */
  if ((error = ffs_flushfiles(oldmnt, flags, p)) != 0)
  break;
- if ((error = softdep_flushworklist(oldmnt, &count, p)) != 0 ||
+ if ((error = softdep_flushworklist(oldmnt, &count, p, 1)) != 0 ||
     count == 0)
  break;
  }
@@ -5253,8 +5243,8 @@ request_cleanup(int resource, int islock
  atomic_setbits_int(&p->p_flag, P_SOFTDEP);
  if (islocked)
  FREE_LOCK(&lk);
- process_worklist_item(NULL, LK_NOWAIT);
- process_worklist_item(NULL, LK_NOWAIT);
+ process_worklist_item(LK_NOWAIT);
+ process_worklist_item(LK_NOWAIT);
  atomic_clearbits_int(&p->p_flag, P_SOFTDEP);
  stat_worklist_push += 2;
  if (islocked)
Index: sys/ufs/ffs/ffs_softdep_stub.c
===================================================================
RCS file: /cvs/src/sys/ufs/ffs/ffs_softdep_stub.c,v
retrieving revision 1.18
diff -u -p -u -r1.18 ffs_softdep_stub.c
--- sys/ufs/ffs/ffs_softdep_stub.c 11 Jun 2013 16:42:18 -0000 1.18
+++ sys/ufs/ffs/ffs_softdep_stub.c 3 Mar 2020 14:51:15 -0000
@@ -168,7 +168,7 @@ softdep_fsync_mountdev(struct vnode *vp,
 }
 
 int
-softdep_flushworklist(struct mount *oldmnt, int *countp, struct proc *p)
+softdep_flushworklist(struct mount *oldmnt, int *countp, struct proc *p, int)
 {
  *countp = 0;
  return (0);
Index: sys/ufs/ffs/ffs_vfsops.c
===================================================================
RCS file: /cvs/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.183
diff -u -p -u -r1.183 ffs_vfsops.c
--- sys/ufs/ffs/ffs_vfsops.c 21 Feb 2020 11:11:15 -0000 1.183
+++ sys/ufs/ffs/ffs_vfsops.c 3 Mar 2020 14:51:15 -0000
@@ -1257,7 +1257,7 @@ ffs_sync(struct mount *mp, int waitfor,
  * Force stale file system control information to be flushed.
  */
  if ((ump->um_mountp->mnt_flag & MNT_SOFTDEP) && waitfor == MNT_WAIT) {
- if ((error = softdep_flushworklist(ump->um_mountp, &count, p)))
+ if ((error = softdep_flushworklist(ump->um_mountp, &count, p, 0)))
  allerror = error;
  /* Flushed work items may create new vnodes to clean */
  if (count)

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Todd C. Miller-3
In reply to this post by Mark Patruck
Here is a minimal fix that only addresses the tight CPU loop in
softdep_process_worklist().  It will exit the loop if we cannot
make progress instead of spinning.

process_worklist_item() now returns 1 if it processed an item or 0
if it could not.  The existing semantics of matchcnt have been
preserved.

 - todd

Index: ffs_softdep.c
===================================================================
RCS file: /cvs/src/sys/ufs/ffs/ffs_softdep.c,v
retrieving revision 1.148
diff -u -p -u -r1.148 ffs_softdep.c
--- ffs_softdep.c 4 Feb 2020 04:09:11 -0000 1.148
+++ ffs_softdep.c 3 Mar 2020 15:34:31 -0000
@@ -163,7 +163,7 @@ STATIC int inodedep_lookup(struct fs *,
 STATIC int pagedep_lookup(struct inode *, daddr_t, int, struct pagedep **);
 STATIC void pause_timer(void *);
 STATIC int request_cleanup(int, int);
-STATIC int process_worklist_item(struct mount *, int);
+STATIC int process_worklist_item(struct mount *, int *, int);
 STATIC void add_to_worklist(struct worklist *);
 
 /*
@@ -639,7 +639,8 @@ softdep_process_worklist(struct mount *m
  loopcount = 1;
  getmicrouptime(&starttime);
  while (num_on_worklist > 0) {
- matchcnt += process_worklist_item(matchmnt, LK_NOWAIT);
+ if (process_worklist_item(matchmnt, &matchcnt, LK_NOWAIT) == 0)
+ break;
 
  /*
  * If a umount operation wants to run the worklist
@@ -707,13 +708,12 @@ softdep_process_worklist(struct mount *m
  * Process one item on the worklist.
  */
 STATIC int
-process_worklist_item(struct mount *matchmnt, int flags)
+process_worklist_item(struct mount *matchmnt, int *matchcnt, int flags)
 {
  struct worklist *wk, *wkend;
  struct dirrem *dirrem;
  struct mount *mp;
  struct vnode *vp;
- int matchcnt = 0;
 
  ACQUIRE_LOCK(&lk);
  /*
@@ -761,8 +761,8 @@ process_worklist_item(struct mount *matc
  panic("%s: dirrem on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ if (matchmnt != NULL && mp == matchmnt)
+ *matchcnt += 1;
  handle_workitem_remove(WK_DIRREM(wk));
  break;
 
@@ -774,8 +774,8 @@ process_worklist_item(struct mount *matc
  panic("%s: freeblks on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ if (matchmnt != NULL && mp == matchmnt)
+ *matchcnt += 1;
  handle_workitem_freeblocks(WK_FREEBLKS(wk));
  break;
 
@@ -787,8 +787,8 @@ process_worklist_item(struct mount *matc
  panic("%s: freefrag on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ if (matchmnt != NULL && mp == matchmnt)
+ *matchcnt += 1;
  handle_workitem_freefrag(WK_FREEFRAG(wk));
  break;
 
@@ -800,8 +800,8 @@ process_worklist_item(struct mount *matc
  panic("%s: freefile on suspended filesystem",
  "process_worklist_item");
 #endif
- if (mp == matchmnt)
- matchcnt += 1;
+ if (matchmnt != NULL && mp == matchmnt)
+ *matchcnt += 1;
  handle_workitem_freefile(WK_FREEFILE(wk));
  break;
 
@@ -810,7 +810,7 @@ process_worklist_item(struct mount *matc
     "softdep", TYPENAME(wk->wk_type));
  /* NOTREACHED */
  }
- return (matchcnt);
+ return (1);
 }
 
 /*
@@ -5253,8 +5253,8 @@ request_cleanup(int resource, int islock
  atomic_setbits_int(&p->p_flag, P_SOFTDEP);
  if (islocked)
  FREE_LOCK(&lk);
- process_worklist_item(NULL, LK_NOWAIT);
- process_worklist_item(NULL, LK_NOWAIT);
+ process_worklist_item(NULL, NULL, LK_NOWAIT);
+ process_worklist_item(NULL, NULL, LK_NOWAIT);
  atomic_clearbits_int(&p->p_flag, P_SOFTDEP);
  stat_worklist_push += 2;
  if (islocked)

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Bob Beck-3
makes sense  to  me and has my ok.  could we see if bluhm@ can be sure this
still works with his workload?

On Tue, Mar 3, 2020 at 08:43 Todd C. Miller <[hidden email]> wrote:

> Here is a minimal fix that only addresses the tight CPU loop in
> softdep_process_worklist().  It will exit the loop if we cannot
> make progress instead of spinning.
>
> process_worklist_item() now returns 1 if it processed an item or 0
> if it could not.  The existing semantics of matchcnt have been
> preserved.
>
>  - todd
>
> Index: ffs_softdep.c
> ===================================================================
> RCS file: /cvs/src/sys/ufs/ffs/ffs_softdep.c,v
> retrieving revision 1.148
> diff -u -p -u -r1.148 ffs_softdep.c
> --- ffs_softdep.c       4 Feb 2020 04:09:11 -0000       1.148
> +++ ffs_softdep.c       3 Mar 2020 15:34:31 -0000
> @@ -163,7 +163,7 @@ STATIC      int inodedep_lookup(struct fs *,
>  STATIC int pagedep_lookup(struct inode *, daddr_t, int, struct pagedep
> **);
>  STATIC void pause_timer(void *);
>  STATIC int request_cleanup(int, int);
> -STATIC int process_worklist_item(struct mount *, int);
> +STATIC int process_worklist_item(struct mount *, int *, int);
>  STATIC void add_to_worklist(struct worklist *);
>
>  /*
> @@ -639,7 +639,8 @@ softdep_process_worklist(struct mount *m
>         loopcount = 1;
>         getmicrouptime(&starttime);
>         while (num_on_worklist > 0) {
> -               matchcnt += process_worklist_item(matchmnt, LK_NOWAIT);
> +               if (process_worklist_item(matchmnt, &matchcnt, LK_NOWAIT)
> == 0)
> +                       break;
>
>                 /*
>                  * If a umount operation wants to run the worklist
> @@ -707,13 +708,12 @@ softdep_process_worklist(struct mount *m
>   * Process one item on the worklist.
>   */
>  STATIC int
> -process_worklist_item(struct mount *matchmnt, int flags)
> +process_worklist_item(struct mount *matchmnt, int *matchcnt, int flags)
>  {
>         struct worklist *wk, *wkend;
>         struct dirrem *dirrem;
>         struct mount *mp;
>         struct vnode *vp;
> -       int matchcnt = 0;
>
>         ACQUIRE_LOCK(&lk);
>         /*
> @@ -761,8 +761,8 @@ process_worklist_item(struct mount *matc
>                         panic("%s: dirrem on suspended filesystem",
>                                 "process_worklist_item");
>  #endif
> -               if (mp == matchmnt)
> -                       matchcnt += 1;
> +               if (matchmnt != NULL && mp == matchmnt)
> +                       *matchcnt += 1;
>                 handle_workitem_remove(WK_DIRREM(wk));
>                 break;
>
> @@ -774,8 +774,8 @@ process_worklist_item(struct mount *matc
>                         panic("%s: freeblks on suspended filesystem",
>                                 "process_worklist_item");
>  #endif
> -               if (mp == matchmnt)
> -                       matchcnt += 1;
> +               if (matchmnt != NULL && mp == matchmnt)
> +                       *matchcnt += 1;
>                 handle_workitem_freeblocks(WK_FREEBLKS(wk));
>                 break;
>
> @@ -787,8 +787,8 @@ process_worklist_item(struct mount *matc
>                         panic("%s: freefrag on suspended filesystem",
>                                 "process_worklist_item");
>  #endif
> -               if (mp == matchmnt)
> -                       matchcnt += 1;
> +               if (matchmnt != NULL && mp == matchmnt)
> +                       *matchcnt += 1;
>                 handle_workitem_freefrag(WK_FREEFRAG(wk));
>                 break;
>
> @@ -800,8 +800,8 @@ process_worklist_item(struct mount *matc
>                         panic("%s: freefile on suspended filesystem",
>                                 "process_worklist_item");
>  #endif
> -               if (mp == matchmnt)
> -                       matchcnt += 1;
> +               if (matchmnt != NULL && mp == matchmnt)
> +                       *matchcnt += 1;
>                 handle_workitem_freefile(WK_FREEFILE(wk));
>                 break;
>
> @@ -810,7 +810,7 @@ process_worklist_item(struct mount *matc
>                     "softdep", TYPENAME(wk->wk_type));
>                 /* NOTREACHED */
>         }
> -       return (matchcnt);
> +       return (1);
>  }
>
>  /*
> @@ -5253,8 +5253,8 @@ request_cleanup(int resource, int islock
>                 atomic_setbits_int(&p->p_flag, P_SOFTDEP);
>                 if (islocked)
>                         FREE_LOCK(&lk);
> -               process_worklist_item(NULL, LK_NOWAIT);
> -               process_worklist_item(NULL, LK_NOWAIT);
> +               process_worklist_item(NULL, NULL, LK_NOWAIT);
> +               process_worklist_item(NULL, NULL, LK_NOWAIT);
>                 atomic_clearbits_int(&p->p_flag, P_SOFTDEP);
>                 stat_worklist_push += 2;
>                 if (islocked)
>
>
Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Todd C. Miller-3
On Sat, 07 Mar 2020 19:35:10 -0700, Bob Beck wrote:

> makes sense  to  me and has my ok.  could we see if bluhm@ can be sure this
> still works with his workload?

Thanks, waiting to see if bluhm@ can confirm this doesn't cause
problems makes sense.  I'm currently travelling but will be home
this evening.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Alexander Bluhm
On Sun, Mar 08, 2020 at 10:15:08AM -0600, Todd C. Miller wrote:
> On Sat, 07 Mar 2020 19:35:10 -0700, Bob Beck wrote:
>
> > makes sense  to  me and has my ok.  could we see if bluhm@ can be sure this
> > still works with his workload?
>
> Thanks, waiting to see if bluhm@ can confirm this doesn't cause
> problems makes sense.  I'm currently travelling but will be home
> this evening.

Diff makes sense, OK bluhm@

I am trying to get this installed on our 6.6 production server where
all the vnd, softdep and needbuf bugs showed up.

bluhm

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Todd C. Miller-3
In reply to this post by Mark Patruck
I just committed my minimal fix.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: -current 100% CPU, softdep related

Mark Patruck
I can confirm that the softdep related hangs i saw are gone.
No more issues since the beginning of my tests almost one
week ago.

Thanks,

        -Mark

On 2020-03-09 17:51, Todd C. Miller wrote:
> I just committed my minimal fix.
>
>   - todd
>

--
Mark Patruck ( mark at wrapped.cx )
GPG key 0xF2865E51 / 187F F6D3 EE04 1DCE 1C74  F644 0D3C F66F F286 5E51

https://www.wrapped.cx