When a lot of combinational logic is driven from top level inputs,
work can be wasted evaluating that logic if the top level inputs don't
change.
This change adds an optimization by performing a change detect on the
top level inputs, and evaluate 'ico' logic only if the top level input
actually changed. This especially helps with --hierarchical/--lib-create
which runs the 'ico' of each sub-model in the eval settle loop.
This was observed to yield 40%+ run-time speedup on some partitioned
designs.
The added change detection is cheap, so it is emitted even if the 'ico'
region is small, and is on by default.
The optimization is only sound if the model itself does not write to the
top level inputs (otherwise the 'previous value' variables would be out
of sync, which are not updated by internal writes.). If we can detect a
top level input is written within the design, then for that input, we
fall back on always running the relevant logic. With --vpi we cannot
prove safety statically, so --vpi will disable this optimisation unless
explicitly enabled. (In which case it's the user's responsibility to not
write to top level inputs via the VPI.)
As per discussion. Remove the unsound V3SplitAs pass. The
isolate_assignments attribute/directive is now parsed and ignored in the
frontend for compatibility but otherwise have no effect.
Fixes#7144
Dumping Dfg patterns can take a non-trivial amount of time, so do it
only with --dump-dfg-patterns, instead of with --stats.
Also further improve dumping format.
Remove parallel (using the FST library writer thread) and offloaded
(separate Verilator internal thread) tracing (only used by FST). These
are not compatible with #6992, and #5806 should yield better performance
in all cases.
Consequently mark '--trace-threads' and '--trace-fst-thread' options as
deprecated
Add jemalloc as an alternative malloc implementation for the Verilator
binary. When both tcmalloc and jemalloc are available, jemalloc is
preferred due to its better performance on RTLMeter.
The new --enable-jemalloc flag (default=check) mirrors the existing
--enable-tcmalloc behavior: auto-detected at configure time, supports
both static and dynamic linking, and is disabled when --enable-dev-asan
is active.
Introduce new pass that converts impure expressions, or those with
function and method calls into simple assignment statements. Please see
the blurb at the top of the file why this is useful and how it works.
In particular currently it enables more Dfg optimization as functions
will be inlined without AstExprStmt.
Ideally we should enforce this lowering is applied to every procedural
statement (there are still a handful of exceptions). With that, long
term with this pass + #6820, there should be no need to ever use an
AstExprStmt past this new lowering pass, which should enable more easier
optimization down the line.
Also ideally this should be run earlier. Currently it's after V3Tristate
as that calls pinReconnectSimple so we don't have to touch Cell ports.
Currently disabled when code coverage is enabled due to #7119.