testing rthreads

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

testing rthreads

Vladimir Kirillov-2
Hi, tech@!

I've been trying to test rthreads and have hit some weird races
using simple tests:

% cat rth.c
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <err.h>

pthread_t worker;
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

void *
worker_run(void *arg)
{
        pthread_self();
        return (NULL);
}

int
main(int argc, char **argv)
{
        if (pthread_create(&worker, NULL, worker_run, NULL) != 0)
                err(1, "pthread_create");

        return (0);
}

I get this segfault almost always:

#0  pthread_exit (retval=0x0) at /usr/src/lib/librthread/rthread.c:223
223             for (clfn = thread->cleanup_fns; clfn; ) {
(gdb) bt
#0  pthread_exit (retval=0x0) at /usr/src/lib/librthread/rthread.c:223
#1  0x05716294 in _rthread_start (v=Could not find the frame base for
"_rthread_start".
) at /usr/src/lib/librthread/rthread.c:100
#2  0x05717be9 in rfork_thread () from /usr/lib/librthread.so.4.1
#3  0x21fb6b94 in ?? () from /usr/lib/libc.so.57.0
#4  0x3c00325f in __progname_storage ()
#5  0x3c003160 in environ ()
#6  0xcfbccc84 in ?? ()
#7  0x064c3b27 in _dl_bind_start () from /usr/libexec/ld.so
#8  0x87cd62e4 in ?? ()
#9  0x00000478 in ?? ()
#10 0xcfbc0033 in ?? ()
#11 0x064c0033 in ?? ()
#12 0x3c003160 in environ ()
#13 0x3c00325f in __progname_storage ()
#14 0xcfbccc84 in ?? ()
#15 0x21fb6b94 in ?? () from /usr/lib/libc.so.57.0
#16 0x00000000 in ?? ()

When trying to run the program in gdb I get weird stuff like:

(gdb) break worker_run
Breakpoint 1 at 0x1c000806: file /home/proger/rthreads/rth.c, line 12.
(gdb) start
Breakpoint 2 at 0x1c0007a0: file /home/proger/rthreads/rth.c, line 21.
Starting program: /home/proger/rthreads/obj/rth
main () at /home/proger/rthreads/rth.c:21
21      {
(gdb) n
main () at /home/proger/rthreads/rth.c:22
22              if (pthread_create(&worker, NULL, worker_run, NULL) != 0)
(gdb) n

Program received signal ?, Unknown signal.
0x1c0007d6 in main () at /home/proger/rthreads/rth.c:22
22              if (pthread_create(&worker, NULL, worker_run, NULL) != 0)
(gdb) n
warning: Signal ? does not exist on this system.

Program received signal SIGKILL, Killed.
0x1c0007d6 in main () at /home/proger/rthreads/rth.c:22
22              if (pthread_create(&worker, NULL, worker_run, NULL) != 0)


kdump | tail is:

31222 rth      CALL  getthrid() # child
31222 rth      RET   getthrid 1031222/0xfbc36
15365 rth      CALL  mprotect(0x2494c000,0x1000,0x1)
31222 rth      PSIG  SIGSEGV SIG_DFL code 1 addr=0x8c trapno=1
31222 rth      NAMI  "rth.core"
15365 rth      PSIG  SIGKILL SIG_DFL code 0

Looks like some stupid race with two threads exiting at almost same
time? Any ideas on tracking it down?

By the way, is such gdb behaviour normal? Does it need any additional
patching before being useful to debug software using rthreads?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: testing rthreads

Ted Unangst-2
On Sun, Oct 24, 2010 at 7:21 PM, Vladimir Kirillov <[hidden email]>
wrote:
> I get this segfault almost always:
>
> #0  pthread_exit (retval=0x0) at /usr/src/lib/librthread/rthread.c:223
> 223             for (clfn = thread->cleanup_fns; clfn; ) {

That's weird.  Can you print out thread at that point?  I may be able
to look into it in a few days too.

> Looks like some stupid race with two threads exiting at almost same
> time? Any ideas on tracking it down?

I don't see how two threads could be doing that to the same thread
though.  This is pre-reaper.

> By the way, is such gdb behaviour normal? Does it need any additional
> patching before being useful to debug software using rthreads?

gdb knows nothing about rthreads right now.  You can attach to the
main process, but that's it.

Reply | Threaded
Open this post in threaded view
|

Re: testing rthreads

Philip Guenther-2
In reply to this post by Vladimir Kirillov-2
On Mon, 25 Oct 2010, Vladimir Kirillov wrote:
> I've been trying to test rthreads and have hit some weird races
> using simple tests:
...
> I get this segfault almost always:
>
> #0  pthread_exit (retval=0x0) at /usr/src/lib/librthread/rthread.c:223
> 223             for (clfn = thread->cleanup_fns; clfn; ) {
...
> Looks like some stupid race with two threads exiting at almost same
> time? Any ideas on tracking it down?

The problem was actually introduced during the c2k10 hackathon, where I
changed getthrid() to always add THREAD_PID_OFFSET to the proc's real pid
(which closes a race for pthread_kill)...but failed to teach fork1() to do
that too.  The patch at bottom fixes this in my testing.


(I wasn't seeing this myself becauswe I'm normally running a severely
hacked librthread that uses the platform's per-thread register to
implement pthread_self() instead of having to walk the thread list.  
Sorry folks.  Time to get this stuff committed...)


> By the way, is such gdb behaviour normal? Does it need any additional
> patching before being useful to debug software using rthreads?

Oh yes, it needs lots of work.  I don't know if anyone had really looked
closely at this yet.

Philip Guenther


Index: sys/kern/kern_fork.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.122
diff -u -p -r1.122 kern_fork.c
--- sys/kern/kern_fork.c 26 Jul 2010 01:56:27 -0000 1.122
+++ sys/kern/kern_fork.c 30 Oct 2010 22:52:39 -0000
@@ -480,7 +480,8 @@ fork1(struct proc *p1, int exitsig, int
  * marking us as parent via retval[1].
  */
  if (retval != NULL) {
- retval[0] = p2->p_pid;
+ retval[0] = p2->p_pid +
+    (flags & FORK_THREAD ? THREAD_PID_OFFSET : 0);
  retval[1] = 0;
  }
  return (0);