Enable V3DfgCache to look up vertices without creating one. Reuse
existing terms in associative expression trees if they already exist
somewhere in the graph.
Add jemalloc as an alternative malloc implementation for the Verilator
binary. When both tcmalloc and jemalloc are available, jemalloc is
preferred due to its better performance on RTLMeter.
The new --enable-jemalloc flag (default=check) mirrors the existing
--enable-tcmalloc behavior: auto-detected at configure time, supports
both static and dynamic linking, and is disabled when --enable-dev-asan
is active.
Used to incorrectly unroll 1D packed arrays of 'bit' or 'logic' into
elements when using --trace-structs if the array element type was given
via a typedef. Keep them as a single signal instead.
The lowerDistConstraints() function was not recursing into ConstraintIf
nodes, causing dist operators inside if-else blocks to remain unlowered
and trigger an internal error when ConstraintExprVisitor encountered them.
Fix by adding recursive handling of ConstraintIf nodes in lowerDistConstraints:
- Check for AstConstraintIf nodes before AstConstraintExpr
- Recursively process thensp() and elsesp() branches
- This ensures all dist operators are lowered regardless of nesting
Test case: t_randomize_dist_conditional.v demonstrates conditional dist:
constraint c {
if (randd) {
x dist { 8'd0 := 1, 8'd255 := 3 }; // 25% / 75%
} else {
x dist { 8'd0 := 3, 8'd255 := 1 }; // 75% / 25%
}
}
Fixes#7221
Also add array bounds and struct/union member counts to trace pushPrefix
(not used by vcd/fst/saif). Together these improve consistency in some
waveform formats.
- Allow reordering pure statements with DPI import calls iff no public
variables (including those read via a DPI export) are involved. This
ensures the DPI import can't observe the reordering
- Allow reordering of pure statements with AstDisplay and AstStop. This
requires an assumption that AstDisplay and AstStop will not read or
write model state other than via a VarRef explicitly present int the
Ast.
Overall this allows eliminating a lot of conditionals around assertions,
which were previously not possible.
Introduce new pass that converts impure expressions, or those with
function and method calls into simple assignment statements. Please see
the blurb at the top of the file why this is useful and how it works.
In particular currently it enables more Dfg optimization as functions
will be inlined without AstExprStmt.
Ideally we should enforce this lowering is applied to every procedural
statement (there are still a handful of exceptions). With that, long
term with this pass + #6820, there should be no need to ever use an
AstExprStmt past this new lowering pass, which should enable more easier
optimization down the line.
Also ideally this should be run earlier. Currently it's after V3Tristate
as that calls pinReconnectSimple so we don't have to touch Cell ports.
Currently disabled when code coverage is enabled due to #7119.
The recent V3InlineCFuncs only checks AstCFunc::varsp for locals, but
V3Reloop used to insert them into AstCFunc::stmtsp resulting in multiple
locals with the same name being inlined into the caller if the stars
align. Fix Reloop. Such things will also go away with #6280.
Rename AstSelLoopVars to AstForeachHeader, and make it a non-NodeExpr.
Tweak parser to always create an AstForeachHeader, so no need to fix it
up later.
After conversion of Ast to Dfg, but before synthesizing AstAlways into
primitives, run a pass to remove variables that are not observable, and
all logic that only computes such variables. This can get rid of a lot
of content early so we don't build redundant Dfgs, and also enables
synthesizing always blocks that use temporaries only in some branches,
which will come in a follow up.
This patch adds IEEE-1800 compliant scheduling support for the Inactive
scheduling region used for #0 delays.
Implementing this requires that **all** IEEE-1800 active region events
are placed in the internal 'act' section. This has simulation
performance implications. It prevents some optimizations (e.g.
V3LifePost), which reduces single threaded performance. It also reduces
the available work and parallelism in the internal 'nba' section, which
reduced the effectiveness of multi-threading severely.
Performance impact on RTLMeter when using scheduling adjusted to support
proper #0 delays is ~10-20% slowdown in single-threaded mode, and ~100%
(2x slower) with --threads 4.
To avoid paying this performance penalty unconditionally, the scheduling
is only adjusted if either:
1. The input contains a statically known #0 delay
2. The input contains a variable #x delay unknown at compile time
If no #0 is present, but #x variable delays are, a ZERODLY warning is
issued advising the use of '--no-sched-zero-delay' which is a promise
by the user that none of the variable delays will evaluate to a zero
delay at run-time. This warning is turned off if '--sched-zero-delay'
is explicitly given. This is similar to the '--timing' option.
If '--no-sched-zero-delay' was used at compile time, then executing
a zero delay will fail at runtime.
A ZERODLY warning is also issued if a static #0 if found, but the user
specified '--no-sched-zero-delay'. In this case the scheduling is not
adjusted to support #0, so executing it will fail at runtime. Presumably
the user knows it won't be executed.
The intended behaviour with all this is the following:
No #0, no #var in the design (#constant is OK)
-> Same as current behaviour, scheduling not adjusted,
same code generated as before
Has static #0 and '--no-sched-zero-delay' is NOT given:
-> No warnings, scheduling adjusted so it just works, runs slow
Has static #0 and '--no-sched-zero-delay' is given:
-> ZERODLY on the #0, scheduling not adjusted, fails at runtime if hit
No static #0, but has #var and no option is given:
-> ZERODLY on the #var advising use of '--no-sched-zero-delay' or
'--sched-zero-delay' (similar to '--timing'), scheduling adjusted
assuming it can be a zero delay and it just works
No static #0, but has #var and '--no-sched-zero-delay' is given:
-> No warning, scheduling not adjusted, fails at runtime if zero delay
No static #0, but has #var and '--sched-zero-delay' is given:
-> No warning, scheduling adjusted so it just works