Update Compiler flag's args to speed up dreamcast projects
Posted: Sat Jan 19, 2019 3:01 am
Hi im sharing this.. I tested all these on lot's of project's and you can get a lot of speed on project's using these -0s plus my chart.. I spent lots of time on this back in the day..
Most People use -03 but dont know what that enables.. here's what it enables
As you can see it does not enable -funroll-loops ... By adding to -03 flags you can tune you compile for more speed
# -falign-functions -falign-loops -falign-labels -falign-jumps
# -fstrict-aliasing -ffast-math -fomit-frame-pointer \
# -fdelete-null-pointer-checks -funroll-all-loops -fno-optimize-sibling-calls \
# -falign-loops -ffloat-store \
# -frename-registers
# -funroll-all-loops -> Adds about 15% speed performance
# -fno-optimize-sibling-calls -> Adds speed
# -funroll-all-loops (in environ-dc.sh) -> Adds speed
# -fomit-frame-pointer -> small effect?
# -falign-loops -> small improvement
# -falign-labels
# -falign-functions=32 -> no effect ?
# -fssa -> no effect ?
# -fexpensive-optimizations -> no effect
# -mbigtable -> no effect
# -mfmovd -> no effect
# -fno-builtin -> no effect
# -fno-gcse -> no effect
# -falign-jumps -> no effect
# -falign-jumps=32 -> no effect
# -fno-guess-branch-probability -> slower
# -fmove-all-movables (except drawgfx.o) -> slower
# -finline-functions & -finline-limit=10000 -> slows things down some
# -fno-strict-aliasing -> display error?
# -fssa & -fdce -> no display!
# -mrelax -> segmentation fault in compilation
# -freduce-all-givs -> drawgfx won't compile
#-fno-for-scope -fno-delayed-branch -> Fixes pc-rel too far
Try it on you projects add what i found best combo to your makefile
This chart is based on a mame Driver
Kos flag with all others
-Wall -ml -mbigtable -mnomacsave -m4-single-only -pipe = faster compiling and better memory managment. No speed to the main emulation at all. Sets up CPU mode and FPU mode -PIPE makes compiling faster no speed up other then that.
Optimize levels no flags test this is standed GCC optimizing level test
-O9 -> speed = 27 fps bin size 1922 bytes 1 .9meg this is larger
-08 -> Speed = 27 fps bin size 1877 bytes 1.8meg same size
-07 -> Speed = 27 fps bin size 1877 bytes 1.8meg
-06 -> Speed = 27 fps bin size 1877 bytes 1.8meg
-05 -> Speed = 27 fps bin size 1877 bytes 1.8meg
-04 -> Speed = 26 fps -slower Strange as GCC Docs say there is only -03
-O3 -> speed = 28 fps again why is this faster then -09 is -09 is false
-O2 -> speed = 27 fps 1fp slower then -03
-O1 -> speed = 24 fps This shows -02 & -03 atlest work.
-Os -> speed = 26 fps slower by 1 fps but 1.722 meg bin size That can make or break a large rom loading .
-O0 -> speed = 13fps Ouch!
Optimize Flag settings
-funroll-all-loops -> Added size almost 150k bloat up will not load rom
-fschedule-insns2 -> small bin size no speed up at all but smoother fps
-fstrict-aliasing -> Added 1 to 2 fps worth using. but less smooth jerky
-fexpensive-optimizations -> speed went from 28fps to 25 2fps loss
-fomit-frame-pointer -> smaller bin size 1fps loss this is a shock!
Best setting ended up on
-04 -fomit-frame-pointer -ffast-math -fno-optimize-sibling-calls
This is for mame driver but can be of use..
Normal project
-03 -fno-for-scope -fno-delayed-branch -fno-optimize-sibling-calls -funroll-all-loops -fschedule-insns2 -fexpensive-optimizations -fomit-frame-pointer -fstrict-aliasing -ffast-math
Gains i found up to 5% to 10% ..
Hope some one find's it useful..
Most People use -03 but dont know what that enables.. here's what it enables
Code: Select all
[color=#FFBFFF]GNU CPP version 3.0.4 (cpplib) (Hitachi SH)
GNU C version 3.0.4 (sh-elf)
compiled by GNU C version 3.2 20020927 (prerelease).
options passed: -lang-c -v -I/usr/local/dc/kos-1.1.9/include
-I/usr/local/dc/kos-1.1.9/libc/include
-I/usr/local/dc/kos-1.1.9/kernel/arch/dreamcast/include -DGNUC=3
-DGNUC_MINOR=0 -DGNUC_PATCHLEVEL=4 -Dsh -DELF -Dsh
-DELF -Acpu=sh -Amachine=sh -DOPTIMIZE -DSTDC_HOSTED=1
-DLITTLE_ENDIAN -DSH4_SINGLE_ONLY -DSDL -DLSB_FIRST -DALIGN_LONG
-DINLINE -DDC -D_arch_dreamcast -ml -m4-single-only -O3
options enabled: -fdefer-pop -foptimize-sibling-calls -fcse-follow-jumps
-fcse-skip-blocks -fexpensive-optimizations -fthread-jumps
-fstrength-reduce -fpeephole -fforce-mem -ffunction-cse -finline-functions
-finline -fkeep-static-consts -fcaller-saves -freg-struct-return
-fdelayed-branch -fgcse -frerun-cse-after-loop -frerun-loop-opt
-fdelete-null-pointer-checks -fschedule-insns2 -fsched-interblock
-fsched-spec -fbranch-count-reg -freorder-blocks -frename-registers
-fcommon -fgnu-linker -fregmove -foptimize-register-move -fargument-alias
-fstrict-aliasing -fident -fpeephole2 -fguess-branch-probability
-fmath-errno -m1 -m2 -m3 -m3e -m4-single-only -m4-nofpu -ml
[/color]
When you -03 this the flags it enables by default
# -falign-functions -falign-loops -falign-labels -falign-jumps
# -fstrict-aliasing -ffast-math -fomit-frame-pointer \
# -fdelete-null-pointer-checks -funroll-all-loops -fno-optimize-sibling-calls \
# -falign-loops -ffloat-store \
# -frename-registers
# -funroll-all-loops -> Adds about 15% speed performance
# -fno-optimize-sibling-calls -> Adds speed
# -funroll-all-loops (in environ-dc.sh) -> Adds speed
# -fomit-frame-pointer -> small effect?
# -falign-loops -> small improvement
# -falign-labels
# -falign-functions=32 -> no effect ?
# -fssa -> no effect ?
# -fexpensive-optimizations -> no effect
# -mbigtable -> no effect
# -mfmovd -> no effect
# -fno-builtin -> no effect
# -fno-gcse -> no effect
# -falign-jumps -> no effect
# -falign-jumps=32 -> no effect
# -fno-guess-branch-probability -> slower
# -fmove-all-movables (except drawgfx.o) -> slower
# -finline-functions & -finline-limit=10000 -> slows things down some
# -fno-strict-aliasing -> display error?
# -fssa & -fdce -> no display!
# -mrelax -> segmentation fault in compilation
# -freduce-all-givs -> drawgfx won't compile
#-fno-for-scope -fno-delayed-branch -> Fixes pc-rel too far
Try it on you projects add what i found best combo to your makefile
This chart is based on a mame Driver
Kos flag with all others
-Wall -ml -mbigtable -mnomacsave -m4-single-only -pipe = faster compiling and better memory managment. No speed to the main emulation at all. Sets up CPU mode and FPU mode -PIPE makes compiling faster no speed up other then that.
Optimize levels no flags test this is standed GCC optimizing level test
-O9 -> speed = 27 fps bin size 1922 bytes 1 .9meg this is larger
-08 -> Speed = 27 fps bin size 1877 bytes 1.8meg same size
-07 -> Speed = 27 fps bin size 1877 bytes 1.8meg
-06 -> Speed = 27 fps bin size 1877 bytes 1.8meg
-05 -> Speed = 27 fps bin size 1877 bytes 1.8meg
-04 -> Speed = 26 fps -slower Strange as GCC Docs say there is only -03
-O3 -> speed = 28 fps again why is this faster then -09 is -09 is false
-O2 -> speed = 27 fps 1fp slower then -03
-O1 -> speed = 24 fps This shows -02 & -03 atlest work.
-Os -> speed = 26 fps slower by 1 fps but 1.722 meg bin size That can make or break a large rom loading .
-O0 -> speed = 13fps Ouch!
Optimize Flag settings
-funroll-all-loops -> Added size almost 150k bloat up will not load rom
-fschedule-insns2 -> small bin size no speed up at all but smoother fps
-fstrict-aliasing -> Added 1 to 2 fps worth using. but less smooth jerky
-fexpensive-optimizations -> speed went from 28fps to 25 2fps loss
-fomit-frame-pointer -> smaller bin size 1fps loss this is a shock!
Best setting ended up on
-04 -fomit-frame-pointer -ffast-math -fno-optimize-sibling-calls
This is for mame driver but can be of use..
Normal project
-03 -fno-for-scope -fno-delayed-branch -fno-optimize-sibling-calls -funroll-all-loops -fschedule-insns2 -fexpensive-optimizations -fomit-frame-pointer -fstrict-aliasing -ffast-math
Gains i found up to 5% to 10% ..
Hope some one find's it useful..