devel/avr/gcc bug -> devel/simulavr build error

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

devel/avr/gcc bug -> devel/simulavr build error

Christian Weisgerber
Since the introduction of retguard, devel/simulavr has continuously
failed to build on amd64.  This is actually a bug in devel/avr/gcc.
The problem was diagnosed early by mortimer@.  As I'm not making
any progress, I'm forwarding his analysis here to give other people
a chance to help out.

------------------------------------------------------------------------
Date: Wed, 9 May 2018 21:58:47 -0400
From: Todd Mortimer <[hidden email]>
To: Christian Weisgerber <[hidden email]>
Cc: [hidden email]
Subject: Re: Retguard needs a ports run

> build failure that happened again when I re-tried: devel/simulavr
>
> avr-gcc  -I. -I../src -I.     -g -Wall -mmcu=atmega128 -MT timer.o -MD -MP -MF .deps/timer.Tpo -c -o timer.o timer.c
> avr-gcc: Internal error: Abort trap (program cc1)
>
> I'm skeptical that this has anything to do with retguard, but it
> is unexpected.

This isn't a retguard failure - it's a buffer overwrite by one. The
overwrite smashes the stack protector, so the Abort is coming from the
stack smash handler:

>>> bt
#0  thrkill () at -:3
#1  0x00000e789907db2c in __stack_smash_handler (func=<optimized out>, damaged=<optimized out>) at /usr/src/lib/libc/sys/stack_protector.c:79
#2  0x00000e7667e2bdb2 in df_record_exit_block_uses ()
#3  0x00000e7667e313b7 in df_update_exit_block_uses ()
#4  0x00000e7667e2f44f in df_update_entry_exit_and_calls ()
#5  0x00000e7667f0a95c in thread_prologue_and_epilogue_insns ()
#6  0x00000e7667f05524 in rest_of_handle_thread_prologue_and_epilogue ()
#7  0x00000e7667fa3213 in execute_one_pass ()
#8  0x00000e7667fa2e9f in execute_pass_list ()
#9  0x00000e7667fa2ec7 in execute_pass_list ()
#10 0x00000e7667fa2ec7 in execute_pass_list ()
#11 0x00000e76680ccea6 in tree_rest_of_compilation ()
#12 0x00000e766827ac77 in cgraph_expand_function ()
#13 0x00000e766827b541 in cgraph_assemble_pending_functions ()
#14 0x00000e766827a9bd in cgraph_finalize_function ()
#15 0x00000e7667d14a8b in finish_function ()
#16 0x00000e7667d83b2b in c_parser_declaration_or_fndef ()
#17 0x00000e7667d8276f in c_parser_external_declaration ()
#18 0x00000e7667d818b7 in c_parser_translation_unit ()
#19 0x00000e7667d81617 in c_parse_file ()
#20 0x00000e7667d73022 in c_common_parse_file ()
#21 0x00000e76680680d1 in compile_file ()
#22 0x00000e7668066f35 in do_compile ()
#23 0x00000e7668066bc9 in toplev_main ()
#24 0x00000e7667d9d4ff in main ()

I stepped through the code to see where it was dying. It's like this:

- df_record_exit_block_uses() has a buffer on the stack

- it calls df_exit_block_uses_collect(), which iterates through the buffer
  setting entries.

- Before returning, df_exit_block_uses_collect() calls
  df_canonize_collection_rec(), which null terminates the buffer, which
  happens to null terminate just past the end of the buffer, which just
  happens to be the stack cookie.

- The cookie check fails, and it dies.

So it seems that the way that retguard is responsible for this is
because retguard changes the stack frame layout a bit, and the stack
cookie happens to be immediately next to the buffer now, and now it gets
whacked. This shouldn't be too hard to patch - it's just a buffer
overflow.

Thanks again!

:-)
Todd

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: devel/avr/gcc bug -> devel/simulavr build error

Theo de Raadt-2
Another way people can approach this is to build the port with
-fstack-protector-all, and all it's dependencies.  The 1-byte
overflow may be easier to diagnose without retguard protection,
but the old school -all protector probably exposes it.

These days, -fstack-protector defaults to -fstack-protector-strong.
-strong is an innovation written by google staffers.  It (probably)
followed conversations I had with the years with some of them to improve
the original heuristic.  Dr. Etoh had written SSP to protect all
functions, but it was too expensive.  So in OpenBSD we went with only
protecting functions with >=16 bytes of local storage.  It was a middle
choice that got us progress.  Then other systems adopted the same
approach.  Over time, the compilers got better at tracking variable
types and dependencies and then google build -strong (search for blogs
discussing it), and we enabled this as the default.  The old school
stackprotector cookie method has the benefit that the debugger has an
easier time following traces.

Someone should compile the ports tree with -fstack-protector-all and
see what falls out.  Wait, don't just see, try to fix what you see :)

Christian Weisgerber <[hidden email]> wrote:

> Since the introduction of retguard, devel/simulavr has continuously
> failed to build on amd64.  This is actually a bug in devel/avr/gcc.
> The problem was diagnosed early by mortimer@.  As I'm not making
> any progress, I'm forwarding his analysis here to give other people
> a chance to help out.
>
> ------------------------------------------------------------------------
> Date: Wed, 9 May 2018 21:58:47 -0400
> From: Todd Mortimer <[hidden email]>
> To: Christian Weisgerber <[hidden email]>
> Cc: [hidden email]
> Subject: Re: Retguard needs a ports run
>
> > build failure that happened again when I re-tried: devel/simulavr
> >
> > avr-gcc  -I. -I../src -I.     -g -Wall -mmcu=atmega128 -MT timer.o -MD -MP -MF .deps/timer.Tpo -c -o timer.o timer.c
> > avr-gcc: Internal error: Abort trap (program cc1)
> >
> > I'm skeptical that this has anything to do with retguard, but it
> > is unexpected.
>
> This isn't a retguard failure - it's a buffer overwrite by one. The
> overwrite smashes the stack protector, so the Abort is coming from the
> stack smash handler:
>
> >>> bt
> #0  thrkill () at -:3
> #1  0x00000e789907db2c in __stack_smash_handler (func=<optimized out>, damaged=<optimized out>) at /usr/src/lib/libc/sys/stack_protector.c:79
> #2  0x00000e7667e2bdb2 in df_record_exit_block_uses ()
> #3  0x00000e7667e313b7 in df_update_exit_block_uses ()
> #4  0x00000e7667e2f44f in df_update_entry_exit_and_calls ()
> #5  0x00000e7667f0a95c in thread_prologue_and_epilogue_insns ()
> #6  0x00000e7667f05524 in rest_of_handle_thread_prologue_and_epilogue ()
> #7  0x00000e7667fa3213 in execute_one_pass ()
> #8  0x00000e7667fa2e9f in execute_pass_list ()
> #9  0x00000e7667fa2ec7 in execute_pass_list ()
> #10 0x00000e7667fa2ec7 in execute_pass_list ()
> #11 0x00000e76680ccea6 in tree_rest_of_compilation ()
> #12 0x00000e766827ac77 in cgraph_expand_function ()
> #13 0x00000e766827b541 in cgraph_assemble_pending_functions ()
> #14 0x00000e766827a9bd in cgraph_finalize_function ()
> #15 0x00000e7667d14a8b in finish_function ()
> #16 0x00000e7667d83b2b in c_parser_declaration_or_fndef ()
> #17 0x00000e7667d8276f in c_parser_external_declaration ()
> #18 0x00000e7667d818b7 in c_parser_translation_unit ()
> #19 0x00000e7667d81617 in c_parse_file ()
> #20 0x00000e7667d73022 in c_common_parse_file ()
> #21 0x00000e76680680d1 in compile_file ()
> #22 0x00000e7668066f35 in do_compile ()
> #23 0x00000e7668066bc9 in toplev_main ()
> #24 0x00000e7667d9d4ff in main ()
>
> I stepped through the code to see where it was dying. It's like this:
>
> - df_record_exit_block_uses() has a buffer on the stack
>
> - it calls df_exit_block_uses_collect(), which iterates through the buffer
>   setting entries.
>
> - Before returning, df_exit_block_uses_collect() calls
>   df_canonize_collection_rec(), which null terminates the buffer, which
>   happens to null terminate just past the end of the buffer, which just
>   happens to be the stack cookie.
>
> - The cookie check fails, and it dies.
>
> So it seems that the way that retguard is responsible for this is
> because retguard changes the stack frame layout a bit, and the stack
> cookie happens to be immediately next to the buffer now, and now it gets
> whacked. This shouldn't be too hard to patch - it's just a buffer
> overflow.
>
> Thanks again!
>
> :-)
> Todd
>
> --
> Christian "naddy" Weisgerber                          [hidden email]
>

Reply | Threaded
Open this post in threaded view
|

Re: devel/avr/gcc bug -> devel/simulavr build error

Grégoire Jadi
In reply to this post by Christian Weisgerber
Christian Weisgerber <[hidden email]> writes:
Hello,

I may have encountered the same bug when I tried to build programs
generated with devel/arduino. cc1 aborts with a coredump during the
compilation.

I think I have found the culprits: both ISR() and SIGNAL() macros from
avr/interrupt.h use the __attribute__((signal)).
(extract from avr/interrupt.h at the end of my mail)

I don't know how much retguard nor __attribute__((signal)) work, but my
guess is that they are stepping on each others foot.

Once I removed the signal attribute form the ISR and SIGNAL macros, I
was also able to compile the programs generated with devel/arduino
without coredump but it doesn't work on the arduino, I guess the
attribute wasn't there for nothing....

When I looked at devel/simulavr source code, the only occurence of the
ISR() macro was in the tests. I was able to build devel/simulavr by
disabling the tests (./configure --disable-tests). However, I wasn't
able to test it because the documentation is for a newer version
(1.0.0), so I end up upgrading devel/simulavr which does work (patch
attached).

Best,

avr/interrupt.h:
/** \def ISR(vector [, attributes])
    \ingroup avr_interrupts

    Introduces an interrupt handler function (interrupt service
    routine) that runs with global interrupts initially disabled
    by default with no attributes specified.

    The attributes are optional and alter the behaviour and resultant
    generated code of the interrupt routine. Multiple attributes may
    be used for a single function, with a space seperating each
    attribute.

    Valid attributes are ISR_BLOCK, ISR_NOBLOCK, ISR_NAKED and
    ISR_ALIASOF(vect).

    \c vector must be one of the interrupt vector names that are
    valid for the particular MCU type.
*/
#  define ISR(vector, [attributes])
#else  /* real code */

#if (__GNUC__ == 4 && __GNUC_MINOR__ >= 1) || (__GNUC__ > 4)
#  define __INTR_ATTRS used, externally_visible
#else /* GCC < 4.1 */
#  define __INTR_ATTRS used
#endif

#ifdef __cplusplus
#  define ISR(vector, ...)            \
    extern "C" void vector (void) __attribute__ ((signal,__INTR_ATTRS)) __VA_ARGS__; \
    void vector (void)
#else
#  define ISR(vector, ...)            \
    void vector (void) __attribute__ ((signal,__INTR_ATTRS)) __VA_ARGS__; \
    void vector (void)
#endif

#endif /* DOXYGEN */

#if defined(__DOXYGEN__)
/** \def SIGNAL(vector)
    \ingroup avr_interrupts

    Introduces an interrupt handler function that runs with global interrupts
    initially disabled.

    This is the same as the ISR macro without optional attributes.
    \deprecated Do not use SIGNAL() in new code. Use ISR() instead.
*/
#  define SIGNAL(vector)
#else  /* real code */

#ifdef __cplusplus
#  define SIGNAL(vector) \
    extern "C" void vector(void) __attribute__ ((signal, __INTR_ATTRS)); \
    void vector (void)
#else
#  define SIGNAL(vector) \
    void vector (void) __attribute__ ((signal, __INTR_ATTRS)); \
    void vector (void)
#endif



> Since the introduction of retguard, devel/simulavr has continuously
> failed to build on amd64.  This is actually a bug in devel/avr/gcc.
> The problem was diagnosed early by mortimer@.  As I'm not making
> any progress, I'm forwarding his analysis here to give other people
> a chance to help out.
>
> ------------------------------------------------------------------------
> Date: Wed, 9 May 2018 21:58:47 -0400
> From: Todd Mortimer <[hidden email]>
> To: Christian Weisgerber <[hidden email]>
> Cc: [hidden email]
> Subject: Re: Retguard needs a ports run
>
>> build failure that happened again when I re-tried: devel/simulavr
>>
>> avr-gcc  -I. -I../src -I.     -g -Wall -mmcu=atmega128 -MT timer.o -MD -MP -MF .deps/timer.Tpo -c -o timer.o timer.c
>> avr-gcc: Internal error: Abort trap (program cc1)
>>
>> I'm skeptical that this has anything to do with retguard, but it
>> is unexpected.
>
> This isn't a retguard failure - it's a buffer overwrite by one. The
> overwrite smashes the stack protector, so the Abort is coming from the
> stack smash handler:
>
>>>> bt
> #0  thrkill () at -:3
> #1  0x00000e789907db2c in __stack_smash_handler (func=<optimized out>, damaged=<optimized out>) at /usr/src/lib/libc/sys/stack_protector.c:79
> #2  0x00000e7667e2bdb2 in df_record_exit_block_uses ()
> #3  0x00000e7667e313b7 in df_update_exit_block_uses ()
> #4  0x00000e7667e2f44f in df_update_entry_exit_and_calls ()
> #5  0x00000e7667f0a95c in thread_prologue_and_epilogue_insns ()
> #6  0x00000e7667f05524 in rest_of_handle_thread_prologue_and_epilogue ()
> #7  0x00000e7667fa3213 in execute_one_pass ()
> #8  0x00000e7667fa2e9f in execute_pass_list ()
> #9  0x00000e7667fa2ec7 in execute_pass_list ()
> #10 0x00000e7667fa2ec7 in execute_pass_list ()
> #11 0x00000e76680ccea6 in tree_rest_of_compilation ()
> #12 0x00000e766827ac77 in cgraph_expand_function ()
> #13 0x00000e766827b541 in cgraph_assemble_pending_functions ()
> #14 0x00000e766827a9bd in cgraph_finalize_function ()
> #15 0x00000e7667d14a8b in finish_function ()
> #16 0x00000e7667d83b2b in c_parser_declaration_or_fndef ()
> #17 0x00000e7667d8276f in c_parser_external_declaration ()
> #18 0x00000e7667d818b7 in c_parser_translation_unit ()
> #19 0x00000e7667d81617 in c_parse_file ()
> #20 0x00000e7667d73022 in c_common_parse_file ()
> #21 0x00000e76680680d1 in compile_file ()
> #22 0x00000e7668066f35 in do_compile ()
> #23 0x00000e7668066bc9 in toplev_main ()
> #24 0x00000e7667d9d4ff in main ()
>
> I stepped through the code to see where it was dying. It's like this:
>
> - df_record_exit_block_uses() has a buffer on the stack
>
> - it calls df_exit_block_uses_collect(), which iterates through the buffer
>   setting entries.
>
> - Before returning, df_exit_block_uses_collect() calls
>   df_canonize_collection_rec(), which null terminates the buffer, which
>   happens to null terminate just past the end of the buffer, which just
>   happens to be the stack cookie.
>
> - The cookie check fails, and it dies.
>
> So it seems that the way that retguard is responsible for this is
> because retguard changes the stack frame layout a bit, and the stack
> cookie happens to be immediately next to the buffer now, and now it gets
> whacked. This shouldn't be too hard to patch - it's just a buffer
> overflow.
>
> Thanks again!
>
> :-)
> Todd

0001-Update-devel-simulavr-to-1.0.0.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: devel/avr/gcc bug -> devel/simulavr build error

Kurt Miller-4
On Wed, 2018-06-20 at 16:25 +0200, Grégoire Jadi wrote:
> Christian Weisgerber <[hidden email]> writes:
> Hello,
>
> I may have encountered the same bug when I tried to build programs
> generated with devel/arduino. cc1 aborts with a coredump during the
> compilation.

I just committed a fix for devel/avr/gcc, please rebuild it from an
updated ports tree or wait for the next set of snapshot packages.
devel/simulavr and devel/arduino build fine now.

-Kurt

Reply | Threaded
Open this post in threaded view
|

Re: devel/avr/gcc bug -> devel/simulavr build error

Grégoire Jadi
Kurt Miller <[hidden email]> writes:

> On Wed, 2018-06-20 at 16:25 +0200, Grégoire Jadi wrote:
>
>> Christian Weisgerber <[hidden email]> writes:
>> Hello,
>>
>> I may have encountered the same bug when I tried to build programs
>> generated with devel/arduino. cc1 aborts with a coredump during the
>> compilation.
>
> I just committed a fix for devel/avr/gcc, please rebuild it from an
> updated ports tree or wait for the next set of snapshot packages.
> devel/simulavr and devel/arduino build fine now.

Thanks! I'll try it when I can.

Regarding my patch to update devel/simulavr, should I submit it on a
dedicated thread?

Best,

Reply | Threaded
Open this post in threaded view
|

Re: devel/avr/gcc bug -> devel/simulavr build error

Kurt Miller-4
On Thu, 2018-06-21 at 17:35 +0200, Grégoire Jadi wrote:

> Kurt Miller <[hidden email]> writes:
> > >
> > On Wed, 2018-06-20 at 16:25 +0200, Grégoire Jadi wrote:
> >
> > >
> > > Christian Weisgerber <[hidden email]> writes:
> > > Hello,
> > >
> > > I may have encountered the same bug when I tried to build programs
> > > generated with devel/arduino. cc1 aborts with a coredump during the
> > > compilation.
> > I just committed a fix for devel/avr/gcc, please rebuild it from an
> > updated ports tree or wait for the next set of snapshot packages.
> > devel/simulavr and devel/arduino build fine now.
> Thanks! I'll try it when I can.
> > Regarding my patch to update devel/simulavr, should I submit it on a
> dedicated thread?
>

Yes please. I just pitched in for the gcc bug and don't use devel/simulavr.

I did quickly look at the patches in your diff and noticed the one for
gdbserver.cpp:

--- /dev/null
+++ b/devel/simulavr/patches/patch-src_cmd_gdbserver_cpp
@@ -0,0 +1,147 @@
+$OpenBSD$
+
+Remove `using namespace std` directive because of a conflict between
+std::bind() and bind(2).
+
+Index: src/cmd/gdbserver.cpp
+--- src/cmd/gdbserver.cpp.orig
++++ src/cmd/gdbserver.cpp
+@@ -24,7 +24,6 @@
+  */

+ #include <iostream>
+-using namespace std;


Try leaving the using namespace std in and putting :: in front of
bind(2) to have it use the global namespace explicitly. It should
make for a much shorter patch and accomplish the same thing I believe.
( e.g. ::bind(sock, ...)

-Kurt