Cranking the optimiser on selected ports?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Cranking the optimiser on selected ports?

Edd Barrett
Hi,

Having been pretty unsatisfied with the performance of fs-uae (and
emulators in general) on OpenBSD, I asked upstream what I could do to
make it run faster and rid of jerky screen updates and frame drops. If
you press ctrl+f10 in fs-uae it displays diagnostics. I sent upstream a
screenshot. Surprisingly the slowness is not a symptom of slow graphics
rendering, but rather of cpu core emulation.

I experimented by cranking the optimiser to -O3 for fs-uae only.  I
wasn't expecting much of a difference, but in this particular case it
makes a big difference. It makes some of the more demanding amiga games,
like pinball dreams and pinball fantasies usable on OpenBSD (even with
the accuracy knob set to the highest).

Now, I know we don't usually like to turn the optimiser up past -O2 for
fear of compiler bugs making crappy code, however, the code gcc has made
in this case for my amd64 machine seems to be ok.

I spoke to Jasper on icb about the possibility of making exceptions. We
came to the conclusion that we could allow exceptions for selected
ports (I guess where latency/lag is critical) as long as we test
carefully.

We could:
 * Turn on -O3 unconditionally for selected ports.
 * Turn on -O3 for selected ports on a subset of architectures.
 * Make -O3 package flavours for selected ports.
 * Any other suggestions?

I prefer one of the first two options.

What do people think? Are any of the options acceptable?

Below is a diff that makes fs-uae listen to CFLAGS and CXXFLAGS and
unconditionally turns on -O3. Try twhat we have in tree and then using
the diff an older machine (say an x61).

If you do testing, please let me know how you get on.

Index: Makefile
===================================================================
RCS file: /cvs/ports/emulators/fs-uae/Makefile,v
retrieving revision 1.5
diff -u -p -u -r1.5 Makefile
--- Makefile 7 Dec 2012 08:43:06 -0000 1.5
+++ Makefile 20 Dec 2012 13:19:23 -0000
@@ -6,7 +6,7 @@ COMMENT = modern Amiga emulator
 V = 2.0.1
 DISTNAME = fs-uae-$V
 CATEGORIES = emulators
-REVISION = 1
+REVISION = 2
 
 HOMEPAGE = http://fengestad.no/fs-uae/
 MAINTAINER = Edd Barrett <[hidden email]>
@@ -34,7 +34,12 @@ RUN_DEPENDS = devel/desktop-file-utils
  x11/py-wxPython
 
 USE_GMAKE = Yes
-MAKE_FLAGS += prefix=${PREFIX}
+
+# We don't usually crank the optimiser up this high, but
+# if you don't it really impacts emulation performace.
+CFLAGS = -O3 -pipe
+CXXFLAGS= ${CFLAGS}
+MAKE_FLAGS += prefix=${PREFIX} CXXFLAGS="${CXXFLAGS}" CFLAGS="${CFLAGS}"
 
 NO_REGRESS = Yes
 
Index: patches/patch-libfsemu_Makefile
===================================================================
RCS file: /cvs/ports/emulators/fs-uae/patches/patch-libfsemu_Makefile,v
retrieving revision 1.1.1.1
diff -u -p -u -r1.1.1.1 patch-libfsemu_Makefile
--- patches/patch-libfsemu_Makefile 22 Nov 2012 23:45:20 -0000 1.1.1.1
+++ patches/patch-libfsemu_Makefile 20 Dec 2012 13:19:23 -0000
@@ -1,9 +1,9 @@
 $OpenBSD: patch-libfsemu_Makefile,v 1.1.1.1 2012/11/22 23:45:20 edd Exp $
 
-Missing libpng flags
+Missing libpng flags. Strip hardcoded CFLAGS
 
---- libfsemu/Makefile.orig Tue Nov 20 00:28:32 2012
-+++ libfsemu/Makefile Tue Nov 20 00:28:44 2012
+--- libfsemu/Makefile.orig Fri Oct 26 17:28:39 2012
++++ libfsemu/Makefile Tue Dec 18 21:28:36 2012
 @@ -36,7 +36,7 @@ warnings = -Wall
  errors = -Werror=implicit-function-declaration
  cppflags = $(CXXFLAGS)
@@ -13,3 +13,16 @@ Missing libpng flags
  $(CFLAGS) -D_FILE_OFFSET_BITS=64
  objects = obj/emu_emu.o obj/emu_video.o obj/emu_audio.o obj/emu_input.o \
  obj/emu_menu.o obj/emu_texture.o obj/emu_font.o \
+@@ -53,12 +53,6 @@ objects = obj/emu_emu.o obj/emu_video.o obj/emu_audio.
+
+ ldflags = $(LDFLAGS)
+ libs =
+-
+-ifeq ($(debug), 1)
+- cflags += -g -O0 -fno-inline
+-else ifneq ($(noflags), 1)
+- cflags += -g -O2
+-endif
+
+ ifeq ($(os), windows)
+

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Antoine Jacoutot-7
On Thu, Dec 20, 2012 at 01:54:25PM +0000, Edd Barrett wrote:

> Hi,
>
> Having been pretty unsatisfied with the performance of fs-uae (and
> emulators in general) on OpenBSD, I asked upstream what I could do to
> make it run faster and rid of jerky screen updates and frame drops. If
> you press ctrl+f10 in fs-uae it displays diagnostics. I sent upstream a
> screenshot. Surprisingly the slowness is not a symptom of slow graphics
> rendering, but rather of cpu core emulation.
>
> I experimented by cranking the optimiser to -O3 for fs-uae only.  I
> wasn't expecting much of a difference, but in this particular case it
> makes a big difference. It makes some of the more demanding amiga games,
> like pinball dreams and pinball fantasies usable on OpenBSD (even with
> the accuracy knob set to the highest).
>
> Now, I know we don't usually like to turn the optimiser up past -O2 for
> fear of compiler bugs making crappy code, however, the code gcc has made
> in this case for my amd64 machine seems to be ok.
>
> I spoke to Jasper on icb about the possibility of making exceptions. We
> came to the conclusion that we could allow exceptions for selected
> ports (I guess where latency/lag is critical) as long as we test
> carefully.

FYI there are already ports that do this.

> We could:
>  * Turn on -O3 unconditionally for selected ports.
>  * Turn on -O3 for selected ports on a subset of architectures.
>  * Make -O3 package flavours for selected ports.
>  * Any other suggestions?
>
> I prefer one of the first two options.
>
> What do people think? Are any of the options acceptable?
>
> Below is a diff that makes fs-uae listen to CFLAGS and CXXFLAGS and
> unconditionally turns on -O3. Try twhat we have in tree and then using
> the diff an older machine (say an x61).
>
> If you do testing, please let me know how you get on.
>
> Index: Makefile
> ===================================================================
> RCS file: /cvs/ports/emulators/fs-uae/Makefile,v
> retrieving revision 1.5
> diff -u -p -u -r1.5 Makefile
> --- Makefile 7 Dec 2012 08:43:06 -0000 1.5
> +++ Makefile 20 Dec 2012 13:19:23 -0000
> @@ -6,7 +6,7 @@ COMMENT = modern Amiga emulator
>  V = 2.0.1
>  DISTNAME = fs-uae-$V
>  CATEGORIES = emulators
> -REVISION = 1
> +REVISION = 2
>  
>  HOMEPAGE = http://fengestad.no/fs-uae/
>  MAINTAINER = Edd Barrett <[hidden email]>
> @@ -34,7 +34,12 @@ RUN_DEPENDS = devel/desktop-file-utils
>   x11/py-wxPython
>  
>  USE_GMAKE = Yes
> -MAKE_FLAGS += prefix=${PREFIX}
> +
> +# We don't usually crank the optimiser up this high, but
> +# if you don't it really impacts emulation performace.
> +CFLAGS = -O3 -pipe
> +CXXFLAGS= ${CFLAGS}
> +MAKE_FLAGS += prefix=${PREFIX} CXXFLAGS="${CXXFLAGS}" CFLAGS="${CFLAGS}"
>  
>  NO_REGRESS = Yes
>  
> Index: patches/patch-libfsemu_Makefile
> ===================================================================
> RCS file: /cvs/ports/emulators/fs-uae/patches/patch-libfsemu_Makefile,v
> retrieving revision 1.1.1.1
> diff -u -p -u -r1.1.1.1 patch-libfsemu_Makefile
> --- patches/patch-libfsemu_Makefile 22 Nov 2012 23:45:20 -0000 1.1.1.1
> +++ patches/patch-libfsemu_Makefile 20 Dec 2012 13:19:23 -0000
> @@ -1,9 +1,9 @@
>  $OpenBSD: patch-libfsemu_Makefile,v 1.1.1.1 2012/11/22 23:45:20 edd Exp $
>  
> -Missing libpng flags
> +Missing libpng flags. Strip hardcoded CFLAGS
>  
> ---- libfsemu/Makefile.orig Tue Nov 20 00:28:32 2012
> -+++ libfsemu/Makefile Tue Nov 20 00:28:44 2012
> +--- libfsemu/Makefile.orig Fri Oct 26 17:28:39 2012
> ++++ libfsemu/Makefile Tue Dec 18 21:28:36 2012
>  @@ -36,7 +36,7 @@ warnings = -Wall
>   errors = -Werror=implicit-function-declaration
>   cppflags = $(CXXFLAGS)
> @@ -13,3 +13,16 @@ Missing libpng flags
>   $(CFLAGS) -D_FILE_OFFSET_BITS=64
>   objects = obj/emu_emu.o obj/emu_video.o obj/emu_audio.o obj/emu_input.o \
>   obj/emu_menu.o obj/emu_texture.o obj/emu_font.o \
> +@@ -53,12 +53,6 @@ objects = obj/emu_emu.o obj/emu_video.o obj/emu_audio.
> +
> + ldflags = $(LDFLAGS)
> + libs =
> +-
> +-ifeq ($(debug), 1)
> +- cflags += -g -O0 -fno-inline
> +-else ifneq ($(noflags), 1)
> +- cflags += -g -O2
> +-endif
> +
> + ifeq ($(os), windows)
> +
>
> --
> Best Regards
> Edd Barrett
>
> http://www.theunixzoo.co.uk
>

--
Antoine

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Stuart Henderson
In reply to this post by Edd Barrett
On 2012/12/20 13:54, Edd Barrett wrote:
> We could:
>  * Turn on -O3 unconditionally for selected ports.
>  * Turn on -O3 for selected ports on a subset of architectures.
>  * Make -O3 package flavours for selected ports.
>  * Any other suggestions?
>
> I prefer one of the first two options.

I prefer 2.

On i386 you might also want to experiment with -fomit-frame-pointer which
frees up another register (of which i386 does not have many in the first
place) and avoids a couple of instructions per function call, at the
expense of debuggers not working with the produced object code.

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Christian Weisgerber
In reply to this post by Edd Barrett
Edd Barrett <[hidden email]> wrote:

> Subject: Cranking the optimiser on selected ports?

No.

> Now, I know we don't usually like to turn the optimiser up past -O2 for
> fear of compiler bugs making crappy code, however, the code gcc has made
> in this case for my amd64 machine seems to be ok.

Optimizer levels other than -O2 are essentially untested.  They may
reveal compiler bugs.  (We discovered a lot of instances of -O1 in
the ports tree because gcc3 on alpha frequently blew up on it.)
They may reveal broken source code (-fstrict-aliasing).  They may
produce broken code.  This will vary by architecture.  The produced
code is typically larger, which may be a pessimization if it overflows
a cache level.

> I spoke to Jasper on icb about the possibility of making exceptions.

I'm afraid this will lead to people asking for exceptions for more
and more ports.

> What do people think? Are any of the options acceptable?

I don't like it.  FWIW.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Marc Espie-2
On Thu, Dec 20, 2012 at 05:46:33PM +0000, Christian Weisgerber wrote:

> Edd Barrett <[hidden email]> wrote:
>
> > Subject: Cranking the optimiser on selected ports?
>
> No.
>
> > Now, I know we don't usually like to turn the optimiser up past -O2 for
> > fear of compiler bugs making crappy code, however, the code gcc has made
> > in this case for my amd64 machine seems to be ok.
>
> Optimizer levels other than -O2 are essentially untested.  They may
> reveal compiler bugs.  (We discovered a lot of instances of -O1 in
> the ports tree because gcc3 on alpha frequently blew up on it.)
> They may reveal broken source code (-fstrict-aliasing).  They may
> produce broken code.  This will vary by architecture.  The produced
> code is typically larger, which may be a pessimization if it overflows
> a cache level.
>
> > I spoke to Jasper on icb about the possibility of making exceptions.
>
> I'm afraid this will lead to people asking for exceptions for more
> and more ports.
>
> > What do people think? Are any of the options acceptable?
>
> I don't like it.  FWIW.

same as naddy. Specifically because -O2 is the only optimization level
that's really tested...

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

John Long-4
On Thu, Dec 20, 2012 at 07:17:00PM +0100, Marc Espie wrote:

> On Thu, Dec 20, 2012 at 05:46:33PM +0000, Christian Weisgerber wrote:
> > Edd Barrett <[hidden email]> wrote:
> >
> > > Subject: Cranking the optimiser on selected ports?
> >
> > No.
> >
> > > Now, I know we don't usually like to turn the optimiser up past -O2 for
> > > fear of compiler bugs making crappy code, however, the code gcc has made
> > > in this case for my amd64 machine seems to be ok.
> >
> > Optimizer levels other than -O2 are essentially untested.  They may
> > reveal compiler bugs.  (We discovered a lot of instances of -O1 in
> > the ports tree because gcc3 on alpha frequently blew up on it.)
> > They may reveal broken source code (-fstrict-aliasing).  They may
> > produce broken code.  This will vary by architecture.  The produced
> > code is typically larger, which may be a pessimization if it overflows
> > a cache level.
> >
> > > I spoke to Jasper on icb about the possibility of making exceptions.
> >
> > I'm afraid this will lead to people asking for exceptions for more
> > and more ports.
> >
> > > What do people think? Are any of the options acceptable?
> >
> > I don't like it.  FWIW.
>
> same as naddy. Specifically because -O2 is the only optimization level
> that's really tested...
>

What about a FLAVOR on specific ports where somebody happens to know it
helps?

/jl

--
ASCII ribbon campaign ( ) Powered by Lemote Fuloong
 against HTML e-mail   X  Loongson MIPS and OpenBSD
   and proprietary    / \    http://www.mutt.org
     attachments     /   \  Code Blue or Go Home!
 Encrypted email preferred  PGP Key 2048R/DA65BC04

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Marc Espie-2
On Thu, Dec 20, 2012 at 06:20:50PM +0000, John Long wrote:
>
> What about a FLAVOR on specific ports where somebody happens to know it
> helps?
>
> /jl

It's still very likely to cause extra amount of grief in most cases.
And more work for porters.

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

John Long-4
On Thu, Dec 20, 2012 at 07:32:44PM +0100, Marc Espie wrote:
> On Thu, Dec 20, 2012 at 06:20:50PM +0000, John Long wrote:
> >
> > What about a FLAVOR on specific ports where somebody happens to know it
> > helps?
> >
> > /jl
>
> It's still very likely to cause extra amount of grief in most cases.
> And more work for porters.

Ok, thanks/sorry.

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Edd Barrett
In reply to this post by Marc Espie-2
On Thu, Dec 20, 2012 at 07:17:00PM +0100, Marc Espie wrote:
> same as naddy. Specifically because -O2 is the only optimization level
> that's really tested...

Yup.

It's a shame that we can't trust gcc's optimiser. That said, since the
gcc4 update, the optimiser may be (more) correct; I just don't know.

What would proper testing of -O3 for gcc4 arches entail? Would this be
building the whole ports tree -O3 and trying the packages. Is there a
more systematic approach, like a test suite or benchmark of correctness?

Cheers

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

joshua stein-3
In reply to this post by Christian Weisgerber
On Thu, 20 Dec 2012 at 17:46:33 +0000, Christian Weisgerber wrote:
> Optimizer levels other than -O2 are essentially untested.
[..]
> > I spoke to Jasper on icb about the possibility of making exceptions.
>
> I'm afraid this will lead to people asking for exceptions for more
> and more ports.

If the specific exceptions only come after people like Edd are
actually using the software and finding that it is much more usable
with the optimization bump, what's the harm?

I think per-architecture tweaks to specific ports are okay, as long
as they've been thoroughly tested on that architecture, are needed
to make the software more usable or better performing, and don't
make it less stable.

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Jasper Lievisse Adriaanse-2
In reply to this post by Stuart Henderson
On Thu, Dec 20, 2012 at 02:20:22PM +0000, Stuart Henderson wrote:

> On 2012/12/20 13:54, Edd Barrett wrote:
> > We could:
> >  * Turn on -O3 unconditionally for selected ports.
> >  * Turn on -O3 for selected ports on a subset of architectures.
> >  * Make -O3 package flavours for selected ports.
> >  * Any other suggestions?
> >
> > I prefer one of the first two options.
>
> I prefer 2.
>
> On i386 you might also want to experiment with -fomit-frame-pointer which
> frees up another register (of which i386 does not have many in the first
> place) and avoids a couple of instructions per function call, at the
> expense of debuggers not working with the produced object code.
I concur; it makes most sense for emulators and stuff like that on arches that
can really benefit from the extra (and safe) optimizations.

--
Cheers,
Jasper

"Stay Hungry. Stay Foolish"

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Stefan Sperling-8
In reply to this post by joshua stein-3
On Thu, Dec 20, 2012 at 03:08:45PM -0600, joshua stein wrote:
> If the specific exceptions only come after people like Edd are
> actually using the software and finding that it is much more usable
> with the optimization bump, what's the harm?

Yes, I agree. Especially WRT games/emulators. If bugs in compilers are
uncovered because of this, all the better. If we don't turn it on blindly
for stuff that hasn't been tested I don't see a problem.

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Edd Barrett
On Thu, Dec 20, 2012 at 10:49:40PM +0100, Stefan Sperling wrote:
> On Thu, Dec 20, 2012 at 03:08:45PM -0600, joshua stein wrote:
> > If the specific exceptions only come after people like Edd are
> > actually using the software and finding that it is much more usable
> > with the optimization bump, what's the harm?
>
> Yes, I agree. Especially WRT games/emulators. If bugs in compilers are
> uncovered because of this, all the better. If we don't turn it on blindly
> for stuff that hasn't been tested I don't see a problem.

Well, Espie has made it pretty clear that he will not allow -O3. If you
want better performance from emulators etc, then you will have to do a
custom package setting CFLAGS and CXXFLAGS. Don't forget to rebuild and
reinstall after every pkg_add -u.

The following diff atleast makes fs-uae listen to CFLAGS and CXXFLAGS.
ok?

Index: Makefile
===================================================================
RCS file: /cvs/ports/emulators/fs-uae/Makefile,v
retrieving revision 1.5
diff -u -p -u -r1.5 Makefile
--- Makefile 7 Dec 2012 08:43:06 -0000 1.5
+++ Makefile 20 Dec 2012 22:00:48 -0000
@@ -6,7 +6,7 @@ COMMENT = modern Amiga emulator
 V = 2.0.1
 DISTNAME = fs-uae-$V
 CATEGORIES = emulators
-REVISION = 1
+REVISION = 2
 
 HOMEPAGE = http://fengestad.no/fs-uae/
 MAINTAINER = Edd Barrett <[hidden email]>
@@ -34,7 +34,8 @@ RUN_DEPENDS = devel/desktop-file-utils
  x11/py-wxPython
 
 USE_GMAKE = Yes
-MAKE_FLAGS += prefix=${PREFIX}
+
+MAKE_FLAGS += prefix=${PREFIX} CXXFLAGS="${CXXFLAGS}" CFLAGS="${CFLAGS}"
 
 NO_REGRESS = Yes
 
Index: patches/patch-libfsemu_Makefile
===================================================================
RCS file: /cvs/ports/emulators/fs-uae/patches/patch-libfsemu_Makefile,v
retrieving revision 1.1.1.1
diff -u -p -u -r1.1.1.1 patch-libfsemu_Makefile
--- patches/patch-libfsemu_Makefile 22 Nov 2012 23:45:20 -0000 1.1.1.1
+++ patches/patch-libfsemu_Makefile 20 Dec 2012 22:00:48 -0000
@@ -1,9 +1,9 @@
 $OpenBSD: patch-libfsemu_Makefile,v 1.1.1.1 2012/11/22 23:45:20 edd Exp $
 
-Missing libpng flags
+Missing libpng flags. Strip hardcoded CFLAGS
 
---- libfsemu/Makefile.orig Tue Nov 20 00:28:32 2012
-+++ libfsemu/Makefile Tue Nov 20 00:28:44 2012
+--- libfsemu/Makefile.orig Fri Oct 26 17:28:39 2012
++++ libfsemu/Makefile Tue Dec 18 21:28:36 2012
 @@ -36,7 +36,7 @@ warnings = -Wall
  errors = -Werror=implicit-function-declaration
  cppflags = $(CXXFLAGS)
@@ -13,3 +13,16 @@ Missing libpng flags
  $(CFLAGS) -D_FILE_OFFSET_BITS=64
  objects = obj/emu_emu.o obj/emu_video.o obj/emu_audio.o obj/emu_input.o \
  obj/emu_menu.o obj/emu_texture.o obj/emu_font.o \
+@@ -53,12 +53,6 @@ objects = obj/emu_emu.o obj/emu_video.o obj/emu_audio.
+
+ ldflags = $(LDFLAGS)
+ libs =
+-
+-ifeq ($(debug), 1)
+- cflags += -g -O0 -fno-inline
+-else ifneq ($(noflags), 1)
+- cflags += -g -O2
+-endif
+
+ ifeq ($(os), windows)
+

--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Stuart Henderson
In reply to this post by Edd Barrett
On 2012/12/20 19:41, Edd Barrett wrote:
> What would proper testing of -O3 for gcc4 arches entail? Would this be
> building the whole ports tree -O3 and trying the packages.

While I think there are (a very few) cases where it makes sense to
test/enable higher O levels on an arch by arch basis doing this
for the whole tree would be insane.

> Is there a more systematic approach, like a test suite or benchmark
> of correctness?

Bugs notwithstanding, optimisers assume things about code and change
results of undefined behaviour and corner cases in ways people
don't expect. Test suites and benchmarks of the compiler/optimiser
aren't going to help predict what it will do with some unknown code.

If (and only if) a particular very performance-sensitive program
has regularly been built on other OS on a certain arch with a certain
optimisation level with a similar compiler I think we might consider
that. Otherwise stick with the OS default.

Reply | Threaded
Open this post in threaded view
|

Re: Cranking the optimiser on selected ports?

Christian Weisgerber
In reply to this post by Edd Barrett
Edd Barrett <[hidden email]> wrote:

> What would proper testing of -O3 for gcc4 arches entail? Would this be
> building the whole ports tree -O3 and trying the packages.

We can't even get comprehensive testing of the existing packages
on amd64/i386, much less across all our architectures.

--
Christian "naddy" Weisgerber                          [hidden email]