Set OPT_FAST=-Os as default (#2374)

This commit is contained in:
Geza Lore 2020-05-28 00:57:49 +01:00 committed by GitHub
parent c5da38206e
commit 622f59ad65
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 80 additions and 63 deletions

View File

@ -35,6 +35,9 @@ The contributors that suggested a given feature are shown in []. Thanks!
**** The run-time library is now compiled with -Os by default. (#2369, #2373) **** The run-time library is now compiled with -Os by default. (#2369, #2373)
**** OPT_FAST is now -Os by default. See the BENCHMARKING & OPTIMIZATION part
of the manual if you experience issues with compilation speed.
* Verilator 4.034 2020-05-03 * Verilator 4.034 2020-05-03

View File

@ -2071,11 +2071,11 @@ distribution.
=head1 BENCHMARKING & OPTIMIZATION =head1 BENCHMARKING & OPTIMIZATION
For best performance, run Verilator with the "-O3 --x-assign fast For best performance, run Verilator with the "-O3 --x-assign fast --x-initial
--x-initial fast --noassert" flags. The -O3 flag will require longer fast --noassert" flags. The -O3 flag will require longer time to run
compile times, and "--x-assign fast --x-initial fast" may increase the risk Verilator, and "--x-assign fast --x-initial fast" may increase the risk of
of reset bugs in trade for performance; see the above documentation for reset bugs in trade for performance; see the above documentation for these
these flags. flags.
If using Verilated multithreaded, use C<numactl> to ensure you are using If using Verilated multithreaded, use C<numactl> to ensure you are using
non-conflicting hardware resources. See L</"MULTITHREADING">. non-conflicting hardware resources. See L</"MULTITHREADING">.
@ -2087,58 +2087,69 @@ simple change to a clock latch used to gate clocks and gained a 60%
performance improvement. performance improvement.
Beyond that, the performance of a Verilated model depends mostly on your Beyond that, the performance of a Verilated model depends mostly on your
C++ compiler and size of your CPU's caches. C++ compiler and size of your CPU's caches. Experience shows that large models
are often limited by the size of the instruction cache, and as such reducing
code size if possible can be beneficial.
By default, the lib/verilated.mk file has optimization turned off. This is The supplied $VERILATOR_ROOT/include/verilated.mk file uses the OPT, OPT_FAST,
for the benefit of new users, as it improves compile times at the cost of OPT_SLOW and OPT_GLOBAL variables to control optimization. You can set these
simulation runtimes. To add optimization as the default, set one of three variables, when compiling the output of Verilator with Make, for example:
OPT, OPT_FAST, or OPT_SLOW lib/verilated.mk. Or, use the -CFLAGS and/or
-LDFLAGS option on the verilator command line to pass the flags directly to
the compiler or linker. Or, just for one run, pass them on the command
line to make:
make OPT_FAST="-Os -march=native -fno-stack-protector" -f Vour.mk Vour__ALL.a make OPT_FAST="-Os -march=native" -f Vour.mk Vour__ALL.a
OPT_FAST specifies optimizations for those parts of the program that are on the OPT_FAST specifies optimization flags for those parts of the model that are on
fast path. This is mostly code that is executed every cycle. OPT_SLOW the fast path. This is mostly code that is executed every cycle. OPT_SLOW
specifies optimizations for slow-path files, which execute only rarely, yet applies to slow-path code, which executes rarely, often only once at the
take a long time to compile with optimization on. OPT_SLOW is ignored if beginning or end of simulation. Note that OPT_SLOW is ignored if
VM_PARALLEL_BUILDS is not 1, in which case all code is compiled with OPT_FAST. VM_PARALLEL_BUILDS is not 1, in which case all generated code will be compiled
See also the C<--output-split> option. OPT specifies overall optimization and in a single compilation unit using OPT_FAST. See also the C<--output-split>
affects all compiles, including those OPT_FAST and OPT_SLOW control. For best option. The OPT_GLOBAL variable applies to common code in the run-time library
results, use OPT="-Os -march=native", and link with "-static". Nearly the same used by verilated models (shipped in $VERILATOR_ROOT/include). Additional C++
results can be had with much better compile times with OPT_FAST="-O1 files passed on the verilator command line use OPT_FAST. The OPT variable
-fstrict-aliasing". Higher optimization such as "-O2" or "-O3" may help, but applies to all compilation units in addition to the specific OPT_* variables
gcc compile times may be excessive under O3 on even medium sized designs. described above.
There is a third variable, OPT_GLOBAL, which applies to common code in the
run-time library used by verilated models. This is set to "-Os" by default
and there should rarely be a need to change it. As the run-time library is
small in comparison to a lot of verilated models, disabling optimization on
the run-time library should not have a serious effect on overall compilation
time, but can have highly detrimental effect on run-time performance,
especially with tracing. The OPT variable also applies to files that are
controlled by OPT_GLOBAL.
Unfortunately, using the optimizer with SystemC files can result in You can also use the -CFLAGS and/or -LDFLAGS options on the verilator command
compiles taking several minutes. (The SystemC libraries have many little line to pass flags directly to the compiler or linker.
inlined functions that drive the compiler nuts.)
For best results, use the latest clang compiler (about 10% faster than The default values of the OPT_* variables are chosen to yield good simulation
GCC). Note the now fairly old GCC 3.2 and earlier have optimization bugs speed with reasonable C++ compilation times. To this end, OPT_FAST is set to
around pointer aliasing detection, which can result in 2x performance "-Os" by default. Higher optimization such as "-O2" or "-O3" may help (though
losses. often they provide only a very small performance benefit), but compile times
may be excessively large even with medium sized designs. Compilation times can
be improved at the expense of simulation speed by reducing optimization, for
example with OPT_FAST="-O0". Often good simulation speed can be achieved with
OPT_FAST="-O1 -fstrict-aliasing" but with improved compilation times. Files
controlled by OPT_SLOW have little effect on performance and therefore OPT_SLOW
is empty by default (equivalent to "-O0") for improved compilation speed. In
common use-cases there should be little benefit in changing OPT_SLOW.
OPT_GLOBAL is set to "-Os" by default and there should rarely be a need to
change it. As the run-time library is small in comparison to a lot of verilated
models, disabling optimization on the run-time library should not have a
serious effect on overall compilation time, but may have detrimental effect on
simulation speed, especially with tracing. In addition to the above, for best
results use OPT="-march=native", the latest Clang compiler (about 10% faster
than GCC), and link statically.
If you will be running many simulations on a single compile, investigate Generally the answer to which optimization level gives the best user experience
feedback driven compilation. With GCC, using -fprofile-arcs, then depends on the use case and some experimentation can pay dividends. For a
speedy debug cycle during development, especially on large designs where C++
compilation speed can dominate, consider using lower optimization to get to an
executable faster. For throughput oriented use cases, for example regressions,
it is usually worth spending extra compilation time to reduce total CPU time.
If you will be running many simulations on a single model, you can investigate
profile guided optimization. With GCC, using -fprofile-arcs, then
-fbranch-probabilities will yield another 15% or so. -fbranch-probabilities will yield another 15% or so.
Modern compilers also support link-time optimization (LTO), which can help Modern compilers also support link-time optimization (LTO), which can help
especially if you link in DPI code. To enable LTO on GCC, pass "-flto" in especially if you link in DPI code. To enable LTO on GCC, pass "-flto" in both
both compilation and link. Note LTO may cause excessive compile times on compilation and link. Note LTO may cause excessive compile times on large
large designs. designs.
Using profile driven compiler optimization, with feedback from a real Unfortunately, using the optimizer with SystemC files can result in compilation
design, can yield up to30% improvements. taking several minutes. (The SystemC libraries have many little inlined
functions that drive the compiler nuts.)
If you are using your own makefiles, you may want to compile the Verilated If you are using your own makefiles, you may want to compile the Verilated
code with -DVL_INLINE_OPT=inline. This will inline functions, however this code with -DVL_INLINE_OPT=inline. This will inline functions, however this
@ -5243,15 +5254,15 @@ test_regress/t/t_extend_class files show an example of how to do this.
=item How do I get faster build times? =item How do I get faster build times?
When running make pass the make variable VM_PARALLEL_BUILDS=1 so that When running make pass the make variable VM_PARALLEL_BUILDS=1 so that
builds occur in parallel. Note this is now set by default if the output builds occur in parallel. Note this is now set by default if an output
code size exceeds the value of --output-split. file was large enough to be split due to the --output-split option.
Verilator emits any infrequently executed "cold" routines into separate Verilator emits any infrequently executed "cold" routines into separate
__Slow.cpp files. This can accelerate compilation as optimization can be __Slow.cpp files. This can accelerate compilation as optimization can be
disabled on these routines. See the OPT_FAST and OPT_SLOW make variables. disabled on these routines. See the OPT_FAST and OPT_SLOW make variables
and the BENCHMARKING & OPTIMIZATION section of the manual.
Use a recent compiler. Newer compilers tend do be faster, with the Use a recent compiler. Newer compilers tend to be faster.
now relatively old GCC 3.0 to 3.3 being horrible.
Compile in parallel on many machines and use caching; see the web for the Compile in parallel on many machines and use caching; see the web for the
ccache, distcc and icecream packages. ccache will skip GCC runs between ccache, distcc and icecream packages. ccache will skip GCC runs between

View File

@ -84,18 +84,21 @@ CPPFLAGS += $(VM_USER_CFLAGS)
LDFLAGS += $(VM_USER_LDFLAGS) LDFLAGS += $(VM_USER_LDFLAGS)
LDLIBS += $(VM_USER_LDLIBS) LDLIBS += $(VM_USER_LDLIBS)
# See the benchmarking section of bin/verilator. ######################################################################
# Support class optimizations. This includes the tracing and symbol table. # Optimization control.
# SystemC takes minutes to optimize, thus it is off by default.
#OPT_SLOW = # See also the BENCHMARKING & OPTIMIZATION section of the manual.
# Fast path optimizations. Most time is spent in these classes.
#OPT_FAST = -Os -fstrict-aliasing # Optimization flags for non performance-critical/rarely executed code.
#OPT_FAST = -O # No optimization by default, which improves compilation speed.
#OPT_FAST = OPT_SLOW =
# Optimization for performance critical/hot code. Most time is spent in these
# routines. Optimizing by default for improved execution speed.
OPT_FAST = -Os
# Optimization applied to the common run-time library used by verilated models. # Optimization applied to the common run-time library used by verilated models.
# For compatibility this is called OPT_GLOBAL even though it only applies to # For compatibility this is called OPT_GLOBAL even though it only applies to
# files in the run-time library. Normally there should be no need for the user # files in the run-time library. Normally there should be no need for the user
# to change this. # to change this as the library is small, but can have significant speed impact.
OPT_GLOBAL = -Os OPT_GLOBAL = -Os
####################################################################### #######################################################################

View File

@ -1111,7 +1111,7 @@ sub compile {
"-DTEST_VERBOSE=\"".($self->{verbose} ? 1 : 0)."\"", "-DTEST_VERBOSE=\"".($self->{verbose} ? 1 : 0)."\"",
"-DTEST_SYSTEMC=\"" .($self->sc ? 1 : 0). "\"", "-DTEST_SYSTEMC=\"" .($self->sc ? 1 : 0). "\"",
"-DCMAKE_PREFIX_PATH=\"".(($ENV{SYSTEMC_INCLUDE}||$ENV{SYSTEMC}||'')."/..\""), "-DCMAKE_PREFIX_PATH=\"".(($ENV{SYSTEMC_INCLUDE}||$ENV{SYSTEMC}||'')."/..\""),
"-DTEST_OPT_FAST=\"" . ($param{benchmark} ? "-Os" : "") . "\"", "-DTEST_OPT_FAST=\"" . ($param{benchmark} ? "-Os" : "-O0") . "\"",
"-DTEST_OPT_GLOBAL=\"" . ($param{benchmark} ? "-Os" : "-O0") . "\"", "-DTEST_OPT_GLOBAL=\"" . ($param{benchmark} ? "-Os" : "-O0") . "\"",
"-DTEST_VERILATION=\"" . $::Opt_Verilation . "\"", "-DTEST_VERILATION=\"" . $::Opt_Verilation . "\"",
]); ]);
@ -1130,7 +1130,7 @@ sub compile {
"TEST_OBJ_DIR=$self->{obj_dir}", "TEST_OBJ_DIR=$self->{obj_dir}",
"CPPFLAGS_DRIVER=-D".uc($self->{name}), "CPPFLAGS_DRIVER=-D".uc($self->{name}),
($self->{verbose} ? "CPPFLAGS_DRIVER2=-DTEST_VERBOSE=1":""), ($self->{verbose} ? "CPPFLAGS_DRIVER2=-DTEST_VERBOSE=1":""),
($param{benchmark} ? "OPT_FAST=-Os" : ""), ($param{benchmark} ? "" : "OPT_FAST=-O0"),
($param{benchmark} ? "" : "OPT_GLOBAL=-O0"), ($param{benchmark} ? "" : "OPT_GLOBAL=-O0"),
"$self->{VM_PREFIX}", # bypass default rule, as we don't need archive "$self->{VM_PREFIX}", # bypass default rule, as we don't need archive
($param{make_flags}||""), ($param{make_flags}||""),