pthread_join() can return before thread has exited

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

pthread_join() can return before thread has exited

Josh Elsasser
The test suite for lang/sbcl has uncovered a race between
pthread_join() and the exiting thread: it may return before it's safe
to unmap a custom thread stack.

In the exiting thread:
  1) pthread_exit() calls _thread_release() via the tc_thread_release
      pointer.
  2) _thread_release() eventually calls _sem_post(&thread->donesem),
     and then returns.
  3) pthread_exit() then calls __threxit(&tib->tib_tid).

In the joining thread, _sem_wait(&thread->donesem, ...) then returns,
possibly before the exiting thread has reached __threxit().

I don't really know what the best fix would be, but waiting for
__threxit() to zero out tib_tid seems to work for me:

diff --git a/lib/librthread/rthread.c b/lib/librthread/rthread.c
index 8825e7844af..1fd1b5ee217 100644
--- a/lib/librthread/rthread.c
+++ b/lib/librthread/rthread.c
@@ -306,6 +306,9 @@ pthread_join(pthread_t thread, void **retval)
  if (retval)
  *retval = thread->retval;
 
+ while (thread->tib->tib_tid)
+ pthread_yield();
+
  /*
  * We should be the last having a ref to this thread,
  * but someone stupid or evil might haved detached it;


Following is a test program which exposes the race. You might need to
run it in a loop to see a segfault, especially on a uniprocessor
machine.


#include <sys/mman.h>
#include <pthread.h>
#include <err.h>

#define NTHREADS 100

pthread_mutex_t exitmtx = PTHREAD_MUTEX_INITIALIZER;

struct thrstk {
        pthread_t pth;
        void *stack;
} threads[NTHREADS];

void *
thrmain(void *arg)
{
        struct thrstk *thrstk = arg;

        if (thrstk == NULL)
                pthread_mutex_lock(&exitmtx);
        else {
                pthread_join(thrstk->pth, NULL);
                munmap(thrstk->stack, PTHREAD_STACK_MIN);
        }

        return NULL;
}

int
main()
{
        pthread_attr_t attr;
        struct thrstk *prev;
        int i;

        pthread_mutex_lock(&exitmtx);
        prev = NULL;
        for (i = 0; i < NTHREADS; i++) {
                threads[i].stack = mmap(NULL, PTHREAD_STACK_MIN,
                    PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON|MAP_STACK, -1, 0);
                if (threads[i].stack == NULL)
                        err(1, "mmap");
                pthread_attr_init(&attr);
                pthread_attr_setstack(&attr, threads[i].stack, PTHREAD_STACK_MIN);
                if (pthread_create(&threads[i].pth, &attr, thrmain, prev) != 0)
                        err(1, "pthread_create");
                pthread_attr_destroy(&attr);
                prev = &threads[i];
        }

        pthread_mutex_unlock(&exitmtx);
        thrmain(prev);

        return 0;
}

Reply | Threaded
Open this post in threaded view
|

Re: pthread_join() can return before thread has exited

Philip Guenther
On Tue, 20 Nov 2018, Josh Elsasser wrote:

> The test suite for lang/sbcl has uncovered a race between pthread_join()
> and the exiting thread: it may return before it's safe to unmap a custom
> thread stack.
>
> In the exiting thread:
>   1) pthread_exit() calls _thread_release() via the tc_thread_release
>       pointer.
>   2) _thread_release() eventually calls _sem_post(&thread->donesem),
>      and then returns.
>   3) pthread_exit() then calls __threxit(&tib->tib_tid).
>
> In the joining thread, _sem_wait(&thread->donesem, ...) then returns,
> possibly before the exiting thread has reached __threxit().

and that's *perfectly* legal by the spec.  The stack is relying on
behavior above and beyond the POSIX spec.  To quote XBD 2.9.8, "Use of
Application-Managed Thread Stacks":
  The application grants to the implementation permanent ownership of and
  control over the application-managed stack when the attributes object in
  which the stack or stackaddr attribute has been set is used, either by
  presenting that attribute's object as the attr argument in a call to
  pthread_create() that completes successfully, or by storing a pointer to
  the attributes object in the sigev_notify_attributes member of a struct
  sigevent and passing that struct sigevent to a function accepting such
  argument that completes successfully. The application may thereafter
  utilize the memory within the stack only within the normal context of
  stack usage within or properly synchronized with a thread that has been
  scheduled by the implementation with stack pointer value(s) that are
  within the range of that stack. In particular, the region of memory
  cannot be freed, nor can it be later specified as the stack for another
  thread.

Per that last sentence, the SBCL test program is invalid.


Philip

Reply | Threaded
Open this post in threaded view
|

Re: pthread_join() can return before thread has exited

Josh Elsasser
On Tue, Nov 20, 2018 at 08:19:22PM -0800, Philip Guenther wrote:

> On Tue, 20 Nov 2018, Josh Elsasser wrote:
> > The test suite for lang/sbcl has uncovered a race between pthread_join()
> > and the exiting thread: it may return before it's safe to unmap a custom
> > thread stack.
> >
> > In the exiting thread:
> >   1) pthread_exit() calls _thread_release() via the tc_thread_release
> >       pointer.
> >   2) _thread_release() eventually calls _sem_post(&thread->donesem),
> >      and then returns.
> >   3) pthread_exit() then calls __threxit(&tib->tib_tid).
> >
> > In the joining thread, _sem_wait(&thread->donesem, ...) then returns,
> > possibly before the exiting thread has reached __threxit().
>
> and that's *perfectly* legal by the spec.  The stack is relying on
> behavior above and beyond the POSIX spec.  To quote XBD 2.9.8, "Use of
> Application-Managed Thread Stacks":
>   The application grants to the implementation permanent ownership of and
>   control over the application-managed stack when the attributes object in
>   which the stack or stackaddr attribute has been set is used, either by
>   presenting that attribute's object as the attr argument in a call to
>   pthread_create() that completes successfully, or by storing a pointer to
>   the attributes object in the sigev_notify_attributes member of a struct
>   sigevent and passing that struct sigevent to a function accepting such
>   argument that completes successfully. The application may thereafter
>   utilize the memory within the stack only within the normal context of
>   stack usage within or properly synchronized with a thread that has been
>   scheduled by the implementation with stack pointer value(s) that are
>   within the range of that stack. In particular, the region of memory
>   cannot be freed, nor can it be later specified as the stack for another
>   thread.
>
> Per that last sentence, the SBCL test program is invalid.

Thanks for the clarification. I passed it upstream and sbcl will now
start threads with the os-allocated stack and then trampoline to the
custom stack.