Fix tsleep(9) during execve(2)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fix tsleep(9) during execve(2)

Visa Hankala-2
The recent change of initializing sls_sig = 0 in sleep_setup()
(r1.164 of kern_synch.c) has introduced a regression with execve(2).
execve(2) sets the current process in single thread mode for replacing
the program image. In this mode, sleep_setup_signal() cancels a sleep,
making tsleep(9) with PCATCH return immediately. Previously, when
sls_sig was initialized with value 1, tsleep() with PCATCH returned
EINTR during exec. Now it returns zero.

The previous behaviour was not exactly right but did not seem to cause
apparent harm. The current situation is different. For example, the
call chain shown below can hang. The sleep does not actually happen so
vwaitforio() can spin because the pending I/O does not get finished.

tsleep
vwaitforio
nfs_flush
nfs_close
VOP_CLOSE
vn_closefile
fdrop
closef
fdcloseexec
sys_execve

The hang was reported and offending commit located by Pavel Korovin.

I think the proper way to fix the problem is to tighten the conditions
when sleep_setup_signal() cancels the sleep. Instead of checking
p->p_p->ps_single != NULL, which affects the whole process, the thread
should determine if P_SUSPSINGLE is set in p->p_flag. This limits the
cancelling to the threads that have been suspended for single thread
mode. The thread that runs execve(2) does not have the flag set.

OK?

Index: kern/kern_synch.c
===================================================================
RCS file: src/sys/kern/kern_synch.c,v
retrieving revision 1.166
diff -u -p -r1.166 kern_synch.c
--- kern/kern_synch.c 20 Mar 2020 17:13:51 -0000 1.166
+++ kern/kern_synch.c 22 Mar 2020 04:25:32 -0000
@@ -480,7 +480,7 @@ sleep_setup_signal(struct sleep_state *s
  * stopped, p->p_wchan will be 0 upon return from CURSIG.
  */
  atomic_setbits_int(&p->p_flag, P_SINTR);
- if (p->p_p->ps_single != NULL || (sls->sls_sig = CURSIG(p)) != 0) {
+ if ((p->p_flag & P_SUSPSINGLE) || (sls->sls_sig = CURSIG(p)) != 0) {
  unsleep(p);
  p->p_stat = SONPROC;
  sls->sls_do_sleep = 0;

Reply | Threaded
Open this post in threaded view
|

Re: Fix tsleep(9) during execve(2)

Claudio Jeker
On Sun, Mar 22, 2020 at 04:58:47AM +0000, Visa Hankala wrote:

> The recent change of initializing sls_sig = 0 in sleep_setup()
> (r1.164 of kern_synch.c) has introduced a regression with execve(2).
> execve(2) sets the current process in single thread mode for replacing
> the program image. In this mode, sleep_setup_signal() cancels a sleep,
> making tsleep(9) with PCATCH return immediately. Previously, when
> sls_sig was initialized with value 1, tsleep() with PCATCH returned
> EINTR during exec. Now it returns zero.
>
> The previous behaviour was not exactly right but did not seem to cause
> apparent harm. The current situation is different. For example, the
> call chain shown below can hang. The sleep does not actually happen so
> vwaitforio() can spin because the pending I/O does not get finished.
>
> tsleep
> vwaitforio
> nfs_flush
> nfs_close
> VOP_CLOSE
> vn_closefile
> fdrop
> closef
> fdcloseexec
> sys_execve
>
> The hang was reported and offending commit located by Pavel Korovin.
>
> I think the proper way to fix the problem is to tighten the conditions
> when sleep_setup_signal() cancels the sleep. Instead of checking
> p->p_p->ps_single != NULL, which affects the whole process, the thread
> should determine if P_SUSPSINGLE is set in p->p_flag. This limits the
> cancelling to the threads that have been suspended for single thread
> mode. The thread that runs execve(2) does not have the flag set.
>
> OK?

OK claudio@. The P_SUSPSINGLE check is also used in userret to enter
single_thread_check() so this makes this code behave the same way.
 

> Index: kern/kern_synch.c
> ===================================================================
> RCS file: src/sys/kern/kern_synch.c,v
> retrieving revision 1.166
> diff -u -p -r1.166 kern_synch.c
> --- kern/kern_synch.c 20 Mar 2020 17:13:51 -0000 1.166
> +++ kern/kern_synch.c 22 Mar 2020 04:25:32 -0000
> @@ -480,7 +480,7 @@ sleep_setup_signal(struct sleep_state *s
>   * stopped, p->p_wchan will be 0 upon return from CURSIG.
>   */
>   atomic_setbits_int(&p->p_flag, P_SINTR);
> - if (p->p_p->ps_single != NULL || (sls->sls_sig = CURSIG(p)) != 0) {
> + if ((p->p_flag & P_SUSPSINGLE) || (sls->sls_sig = CURSIG(p)) != 0) {
>   unsleep(p);
>   p->p_stat = SONPROC;
>   sls->sls_do_sleep = 0;
>

--
:wq Claudio