SIGBUS on octeon for my program

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

SIGBUS on octeon for my program

Peter J. Philipp-3
Hi,

My DNS program gets a SIGBUS when I execute it.  I have ktraced it, upped
limits and searched in the mips64 source for answers, could this be a compiler
problem?

ktrace----->
 41651 dddctl   CALL  connect(6,0xfffffcacb0,16)
 41651 dddctl   STRU  struct sockaddr { AF_INET, 192.168.177.2:10053 }
 41651 dddctl   RET   connect 0
 41651 dddctl   CALL  kbind(0xfffffc9b48,24,0x801d30cbade359aa)
 41651 dddctl   RET   kbind 0
 41651 dddctl   PSIG  SIGBUS SIG_DFL code BUS_ADRALN<1> addr=0xfffffca17d trapno=0
 82637 dddctl   RET   wait4 41651/0xa2b3
<-----------

The SIGBUS code ADRALN I have found in /sys/arch/mips64/mips64/trap.c around
line 463 on OpenBSD 6.6:

------------>
        case T_ADDR_ERR_LD+T_USER:      /* misaligned or kseg access */
        case T_ADDR_ERR_ST+T_USER:      /* misaligned or kseg access */
                ucode = 0;              /* XXX should be PROT_something */
                signal = SIGBUS;
                sicode = BUS_ADRALN;
                break;
<-----------

I have also set the stack ulimit to 32K but no relief.  I'm stuck, wondering
if you guys can help with interpreting this.

My program can be downloaded with

ftp https://delphinusdns.org/download/snapshot/delphinusdnsd-snapshot.tgz

Where it's remade at midnight CET every day.

As far as I know it should work on macppc although this particular function
wasn't tested on macppc.  And it works on amd64 as I run this delphinusdnsd
in production on my personal nameservers.  Getting this working on octeon
would broaden my test network.

Best Regards,
-peter

Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

Janne Johansson-3
There was a fix recently for the stack getting unaligned committed just
recently, do you have that?
If not, test on current.


Den ons 27 nov. 2019 kl 14:48 skrev Peter J. Philipp <[hidden email]>:

> Hi,
>
> My DNS program gets a SIGBUS when I execute it.  I have ktraced it, upped
> limits and searched in the mips64 source for answers, could this be a
> compiler
> problem?
>
> ktrace----->
>  41651 dddctl   CALL  connect(6,0xfffffcacb0,16)
>  41651 dddctl   STRU  struct sockaddr { AF_INET, 192.168.177.2:10053 }
>  41651 dddctl   RET   connect 0
>  41651 dddctl   CALL  kbind(0xfffffc9b48,24,0x801d30cbade359aa)
>  41651 dddctl   RET   kbind 0
>  41651 dddctl   PSIG  SIGBUS SIG_DFL code BUS_ADRALN<1> addr=0xfffffca17d
> trapno=0
>  82637 dddctl   RET   wait4 41651/0xa2b3
> <-----------
>
> The SIGBUS code ADRALN I have found in /sys/arch/mips64/mips64/trap.c
> around
> line 463 on OpenBSD 6.6:
>
> ------------>
>         case T_ADDR_ERR_LD+T_USER:      /* misaligned or kseg access */
>         case T_ADDR_ERR_ST+T_USER:      /* misaligned or kseg access */
>                 ucode = 0;              /* XXX should be PROT_something */
>                 signal = SIGBUS;
>                 sicode = BUS_ADRALN;
>                 break;
> <-----------
>
> I have also set the stack ulimit to 32K but no relief.  I'm stuck,
> wondering
> if you guys can help with interpreting this.
>
> My program can be downloaded with
>
> ftp https://delphinusdns.org/download/snapshot/delphinusdnsd-snapshot.tgz
>
> Where it's remade at midnight CET every day.
>
> As far as I know it should work on macppc although this particular function
> wasn't tested on macppc.  And it works on amd64 as I run this delphinusdnsd
> in production on my personal nameservers.  Getting this working on octeon
> would broaden my test network.
>
> Best Regards,
> -peter
>
>

--
May the most significant bit of your life be positive.
Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

David Higgs
In reply to this post by Peter J. Philipp-3
I don't speak ktrace but looks like alignment problems with a stack
variable.  What does gdb report?

--david

On Wed, Nov 27, 2019 at 8:48 AM Peter J. Philipp <[hidden email]> wrote:

> Hi,
>
> My DNS program gets a SIGBUS when I execute it.  I have ktraced it, upped
> limits and searched in the mips64 source for answers, could this be a
> compiler
> problem?
>
> ktrace----->
>  41651 dddctl   CALL  connect(6,0xfffffcacb0,16)
>  41651 dddctl   STRU  struct sockaddr { AF_INET, 192.168.177.2:10053 }
>  41651 dddctl   RET   connect 0
>  41651 dddctl   CALL  kbind(0xfffffc9b48,24,0x801d30cbade359aa)
>  41651 dddctl   RET   kbind 0
>  41651 dddctl   PSIG  SIGBUS SIG_DFL code BUS_ADRALN<1> addr=0xfffffca17d
> trapno=0
>  82637 dddctl   RET   wait4 41651/0xa2b3
> <-----------
>
> The SIGBUS code ADRALN I have found in /sys/arch/mips64/mips64/trap.c
> around
> line 463 on OpenBSD 6.6:
>
> ------------>
>         case T_ADDR_ERR_LD+T_USER:      /* misaligned or kseg access */
>         case T_ADDR_ERR_ST+T_USER:      /* misaligned or kseg access */
>                 ucode = 0;              /* XXX should be PROT_something */
>                 signal = SIGBUS;
>                 sicode = BUS_ADRALN;
>                 break;
> <-----------
>
> I have also set the stack ulimit to 32K but no relief.  I'm stuck,
> wondering
> if you guys can help with interpreting this.
>
> My program can be downloaded with
>
> ftp https://delphinusdns.org/download/snapshot/delphinusdnsd-snapshot.tgz
>
> Where it's remade at midnight CET every day.
>
> As far as I know it should work on macppc although this particular function
> wasn't tested on macppc.  And it works on amd64 as I run this delphinusdnsd
> in production on my personal nameservers.  Getting this working on octeon
> would broaden my test network.
>
> Best Regards,
> -peter
>
>
Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

Peter J. Philipp-3
In reply to this post by Janne Johansson-3
On Wed, Nov 27, 2019 at 03:13:19PM +0100, Janne Johansson wrote:
> There was a fix recently for the stack getting unaligned committed just
> recently, do you have that?
> If not, test on current.

Ok it'll take me a bit to test on current, as I don't have that fix yet.

Thanks a lot!  I'll come back in a few days to report how it went.

Best Regards,

-peter


> Den ons 27 nov. 2019 kl 14:48 skrev Peter J. Philipp <[hidden email]>:
>
> > Hi,
> >
> > My DNS program gets a SIGBUS when I execute it.  I have ktraced it, upped
> > limits and searched in the mips64 source for answers, could this be a
> > compiler
> > problem?
> >
> > ktrace----->
> >  41651 dddctl   CALL  connect(6,0xfffffcacb0,16)
> >  41651 dddctl   STRU  struct sockaddr { AF_INET, 192.168.177.2:10053 }
> >  41651 dddctl   RET   connect 0
> >  41651 dddctl   CALL  kbind(0xfffffc9b48,24,0x801d30cbade359aa)
> >  41651 dddctl   RET   kbind 0
> >  41651 dddctl   PSIG  SIGBUS SIG_DFL code BUS_ADRALN<1> addr=0xfffffca17d
> > trapno=0
> >  82637 dddctl   RET   wait4 41651/0xa2b3
> > <-----------
> >
> > The SIGBUS code ADRALN I have found in /sys/arch/mips64/mips64/trap.c
> > around
> > line 463 on OpenBSD 6.6:
> >
> > ------------>
> >         case T_ADDR_ERR_LD+T_USER:      /* misaligned or kseg access */
> >         case T_ADDR_ERR_ST+T_USER:      /* misaligned or kseg access */
> >                 ucode = 0;              /* XXX should be PROT_something */
> >                 signal = SIGBUS;
> >                 sicode = BUS_ADRALN;
> >                 break;
> > <-----------
> >
> > I have also set the stack ulimit to 32K but no relief.  I'm stuck,
> > wondering
> > if you guys can help with interpreting this.
> >
> > My program can be downloaded with
> >
> > ftp https://delphinusdns.org/download/snapshot/delphinusdnsd-snapshot.tgz
> >
> > Where it's remade at midnight CET every day.
> >
> > As far as I know it should work on macppc although this particular function
> > wasn't tested on macppc.  And it works on amd64 as I run this delphinusdnsd
> > in production on my personal nameservers.  Getting this working on octeon
> > would broaden my test network.
> >
> > Best Regards,
> > -peter
> >
> >
>
> --
> May the most significant bit of your life be positive.

Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

Peter J. Philipp-3
In reply to this post by David Higgs
On Wed, Nov 27, 2019 at 09:16:51AM -0500, David Higgs wrote:
> I don't speak ktrace but looks like alignment problems with a stack
> variable.  What does gdb report?
>
> --david

Hi David,

I'm going to upgrade to -current and then report back.. it'll take me a few
days to do that (I'm super slow).

FWIW though:

----->
(gdb) run configtest                                                            
Starting program: /usr/local/bin/dddctl configtest            
(no debugging symbols found)                                                    
ptrace: Invalid argument.    
_thread_sys_fork () at -:15                                                    
15      -: No such file or directory.
        in -
(gdb) bt
#0  _thread_sys_fork () at -:15
#1  0x0000004236424ac8 in _libc_fork_wrap ()
    at /usr/src/lib/libc/sys/w_fork.c:51
#2  0x0000003d5b7b3028 in drop_privs () from /usr/local/bin/dddctl
#3  0x000000ffffffce60 in ?? ()
warning: GDB can't find the start of the function at 0xffffffce60.

    GDB is unable to find the start of the function at 0xffffffce60
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.                                                            
    This problem is most likely caused by an invalid program counter or
stack pointer.                          
    However, if you think GDB should simply search farther back  
from 0xffffffce60 for code which looks like the beginning of a
function, you can increase the range of the search using the `set  
heuristic-fence-post' command.                                                  
Previous frame inner to this frame (corrupt stack?)            
<----

I'll see if this sort of issue repeats after I upgrade the octeon router
to -current.

Thanks!
-peter

Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

Peter J. Philipp-3
On Wed, Nov 27, 2019 at 03:30:23PM +0100, Peter J. Philipp wrote:

> Hi David,
>
> I'm going to upgrade to -current and then report back.. it'll take me a few
> days to do that (I'm super slow).
>
> ...
> I'll see if this sort of issue repeats after I upgrade the octeon router
> to -current.
>
> Thanks!
> -peter

Hi David and Misc@,

OK this is weird.  For people who are just reading this thread it's about
OpenBSD/octeon SIGBUS'ing my program with an unalignment error.

I upgraded to -current (very pleased that pppoe0 works now, as an aside),
and got the debug right after the fork.  Here is the output of gdb

debug.1--->
Continuing.

Program received signal SIGBUS, Bus error.
lookup_axfr (f=0x2c6274db78, so=6, zonename=0x2c6b3d2130 "words.",
    mysoa=0xfffffe7920, format=20, tsigkey=0x2c37b083d0 "pass",
    tsigpass=0xfffffe7b60 "*censored*=",
    segment=0xfffffe7b40, answers=0xfffffe7b3c, additionalcount=0xfffffe7b38)
    at util.c:1842
1842            *type = htons(DNS_TYPE_AXFR);
(gdb) list
1837
1838            memcpy(p, name, len);
1839            totallen += len;
1840
1841            type = (u_int16_t *)&query[totallen];
1842            *type = htons(DNS_TYPE_AXFR);
1843            totallen += sizeof(u_int16_t);
1844
1845            class = (u_int16_t *)&query[totallen];
1846            *class = htons(DNS_CLASS_IN);
(gdb) print type
$1 = (u_int16_t *) 0xfffffe763d
<-----

So I point a pointer of type u_int16_t to a memory address (which may be
unaligned) and then load it with a value.  I've always done it this way
throughout my program, but just to be sure that it isn't caused by the stack
somehow I moved query variable to the heap (or it's equivalent) and tried
again.

debug.2--->
Loaded symbols for /usr/libexec/ld.so
pull_remote_zone (lrz=0x216549d000) at parse.y:3605
3605                            while (debugger == 1)
(gdb) set debugger=0
Current language:  auto; currently minimal
(gdb) cont
Continuing.

Program received signal SIGBUS, Bus error.
lookup_axfr (f=0x21b8dfdb78, so=6, zonename=0x22041a4a10 "words.",
    mysoa=0xffffff9050, format=20, tsigkey=0x22052c35b0 "pass",
    tsigpass=0xffffff9290 "*censored*=",
    segment=0xffffff9270, answers=0xffffff926c, additionalcount=0xffffff9268)
    at util.c:1849
1849            *type = htons(DNS_TYPE_AXFR);
(gdb) print type
$1 = (u_int16_t *) 0x21d57f1015
(gdb) print query
$2 = 0x21d57f1000 ""
(gdb) print (char*)type - query
$3 = 21
<-------

So now I have a true offset of variable type it's not aligned.  What can I
do here?  Does this need fixing in OpenBSD/octeon or do I have to fix my
code for this?  I don't really know what to use instead of this other than
memcpy'ing this value instead of (overloading?) the 16 bit integer.

Any help is welcomed!

Best regards,
-peter

Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

Theo de Raadt-2
Half the cpu platforms fault on unaligned access.

There are strategies for handling this. Your code must use them.

It is kind of boring, actually.


Peter J. Philipp <[hidden email]> wrote:

> On Wed, Nov 27, 2019 at 03:30:23PM +0100, Peter J. Philipp wrote:
> > Hi David,
> >
> > I'm going to upgrade to -current and then report back.. it'll take me a few
> > days to do that (I'm super slow).
> >
> > ...
> > I'll see if this sort of issue repeats after I upgrade the octeon router
> > to -current.
> >
> > Thanks!
> > -peter
>
> Hi David and Misc@,
>
> OK this is weird.  For people who are just reading this thread it's about
> OpenBSD/octeon SIGBUS'ing my program with an unalignment error.
>
> I upgraded to -current (very pleased that pppoe0 works now, as an aside),
> and got the debug right after the fork.  Here is the output of gdb
>
> debug.1--->
> Continuing.
>
> Program received signal SIGBUS, Bus error.
> lookup_axfr (f=0x2c6274db78, so=6, zonename=0x2c6b3d2130 "words.",
>     mysoa=0xfffffe7920, format=20, tsigkey=0x2c37b083d0 "pass",
>     tsigpass=0xfffffe7b60 "*censored*=",
>     segment=0xfffffe7b40, answers=0xfffffe7b3c, additionalcount=0xfffffe7b38)
>     at util.c:1842
> 1842            *type = htons(DNS_TYPE_AXFR);
> (gdb) list
> 1837
> 1838            memcpy(p, name, len);
> 1839            totallen += len;
> 1840
> 1841            type = (u_int16_t *)&query[totallen];
> 1842            *type = htons(DNS_TYPE_AXFR);
> 1843            totallen += sizeof(u_int16_t);
> 1844
> 1845            class = (u_int16_t *)&query[totallen];
> 1846            *class = htons(DNS_CLASS_IN);
> (gdb) print type
> $1 = (u_int16_t *) 0xfffffe763d
> <-----
>
> So I point a pointer of type u_int16_t to a memory address (which may be
> unaligned) and then load it with a value.  I've always done it this way
> throughout my program, but just to be sure that it isn't caused by the stack
> somehow I moved query variable to the heap (or it's equivalent) and tried
> again.
>
> debug.2--->
> Loaded symbols for /usr/libexec/ld.so
> pull_remote_zone (lrz=0x216549d000) at parse.y:3605
> 3605                            while (debugger == 1)
> (gdb) set debugger=0
> Current language:  auto; currently minimal
> (gdb) cont
> Continuing.
>
> Program received signal SIGBUS, Bus error.
> lookup_axfr (f=0x21b8dfdb78, so=6, zonename=0x22041a4a10 "words.",
>     mysoa=0xffffff9050, format=20, tsigkey=0x22052c35b0 "pass",
>     tsigpass=0xffffff9290 "*censored*=",
>     segment=0xffffff9270, answers=0xffffff926c, additionalcount=0xffffff9268)
>     at util.c:1849
> 1849            *type = htons(DNS_TYPE_AXFR);
> (gdb) print type
> $1 = (u_int16_t *) 0x21d57f1015
> (gdb) print query
> $2 = 0x21d57f1000 ""
> (gdb) print (char*)type - query
> $3 = 21
> <-------
>
> So now I have a true offset of variable type it's not aligned.  What can I
> do here?  Does this need fixing in OpenBSD/octeon or do I have to fix my
> code for this?  I don't really know what to use instead of this other than
> memcpy'ing this value instead of (overloading?) the 16 bit integer.
>
> Any help is welcomed!
>
> Best regards,
> -peter
>

Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

Peter J. Philipp-3
On Thu, Nov 28, 2019 at 11:44:07PM -0700, Theo de Raadt wrote:
> Half the cpu platforms fault on unaligned access.
>
> There are strategies for handling this. Your code must use them.
>
> It is kind of boring, actually.

I took a look at how libasr does it, and I have similar code ie. pack8(),
pack16(), pack32(), I will just change all my functions, as unalignment
flags passed per -m aren't standard on every arch.  I did use __packed on
structs already but the way libasr does is a great example I think.

It's a bit of work but the fact that this could work on architectures like
octeon make it worth it for me.

Thanks a lot!
-peter

Reply | Threaded
Open this post in threaded view
|

Re: SIGBUS on octeon for my program

David Higgs
Be warned that __packed doesn't do quite what you think it does.

void func(int *p) {
    *p = 0;
}

If you pass an unaligned pointer into this function on a strict-alignment
platform, your program will likely crash.  I am unaware of any attribute
that can inform the compiler that 'p' may be misaligned.

--david

On Fri, Nov 29, 2019 at 2:54 AM Peter J. Philipp <[hidden email]> wrote:

> On Thu, Nov 28, 2019 at 11:44:07PM -0700, Theo de Raadt wrote:
> > Half the cpu platforms fault on unaligned access.
> >
> > There are strategies for handling this. Your code must use them.
> >
> > It is kind of boring, actually.
>
> I took a look at how libasr does it, and I have similar code ie. pack8(),
> pack16(), pack32(), I will just change all my functions, as unalignment
> flags passed per -m aren't standard on every arch.  I did use __packed on
> structs already but the way libasr does is a great example I think.
>
> It's a bit of work but the fact that this could work on architectures like
> octeon make it worth it for me.
>
> Thanks a lot!
> -peter
>
>