This patch adds IEEE-1800 compliant scheduling support for the Inactive
scheduling region used for #0 delays.
Implementing this requires that **all** IEEE-1800 active region events
are placed in the internal 'act' section. This has simulation
performance implications. It prevents some optimizations (e.g.
V3LifePost), which reduces single threaded performance. It also reduces
the available work and parallelism in the internal 'nba' section, which
reduced the effectiveness of multi-threading severely.
Performance impact on RTLMeter when using scheduling adjusted to support
proper #0 delays is ~10-20% slowdown in single-threaded mode, and ~100%
(2x slower) with --threads 4.
To avoid paying this performance penalty unconditionally, the scheduling
is only adjusted if either:
1. The input contains a statically known #0 delay
2. The input contains a variable #x delay unknown at compile time
If no #0 is present, but #x variable delays are, a ZERODLY warning is
issued advising the use of '--no-sched-zero-delay' which is a promise
by the user that none of the variable delays will evaluate to a zero
delay at run-time. This warning is turned off if '--sched-zero-delay'
is explicitly given. This is similar to the '--timing' option.
If '--no-sched-zero-delay' was used at compile time, then executing
a zero delay will fail at runtime.
A ZERODLY warning is also issued if a static #0 if found, but the user
specified '--no-sched-zero-delay'. In this case the scheduling is not
adjusted to support #0, so executing it will fail at runtime. Presumably
the user knows it won't be executed.
The intended behaviour with all this is the following:
No #0, no #var in the design (#constant is OK)
-> Same as current behaviour, scheduling not adjusted,
same code generated as before
Has static #0 and '--no-sched-zero-delay' is NOT given:
-> No warnings, scheduling adjusted so it just works, runs slow
Has static #0 and '--no-sched-zero-delay' is given:
-> ZERODLY on the #0, scheduling not adjusted, fails at runtime if hit
No static #0, but has #var and no option is given:
-> ZERODLY on the #var advising use of '--no-sched-zero-delay' or
'--sched-zero-delay' (similar to '--timing'), scheduling adjusted
assuming it can be a zero delay and it just works
No static #0, but has #var and '--no-sched-zero-delay' is given:
-> No warning, scheduling not adjusted, fails at runtime if zero delay
No static #0, but has #var and '--sched-zero-delay' is given:
-> No warning, scheduling adjusted so it just works
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.
Internals: Refactor AstNodeBlock representation (#6280)
AstNodeBlock now has 2 child lists: 'declsp' to hold declarations within
the block, and 'stmtsp' to hold the procedural statements.
AstBegin is then just a simple subtype of AstNodeBlock.
AstFork is a proper superset of AstNodeBlock (and also AstBegin), and
adds 'forksp' which hold the parallel statements. Having the sequential
'stmtsp' in AstFork is required to properly implement variable
initializers in fork blocks (IEEE 1800-2023 9.3.2), this makes that
clear, while also separating the non AstNodeStmt declarations
(for #6280). The actual fork branches in 'AstFork::forkps()' are all
AstBegin nodes. This is required as lowering stages will introduce
additional statements in each parallel branch. (We used to wrap AstFork
statements into AstBegin in 3 different places, now they always are
AstBegin and this is enforced via the type checker/V3Broken).
Also fixes incorrect disabling of forked processes from within the `fork`.
Initial idea was to remodel AssignW as Assign under Alway. Trying that
uncovered some issues, the most difficult of them was that a delay
attached to a continuous assignment behaves differently from a delay
attached to a blocking assignment statement, so we need to keep the
knowledge of which flavour an assignment was until V3Timing.
So instead of removing AstAssignW, we always wrap it in an AstAlways,
with a special `keyword()` type. This makes it into a proper procedural
statement, which is almost equivalent to AstAssign, except for the case
when they contain a delay. We still gain the benefits of #6280 and can
simplify some code. Every AstNodeStmt should now be under an
AstNodeProcedure - which we should rename to AstProcess, or an
AstNodeFTask). As a result, V3Table can now handle AssignW for free.
Also uncovered and fixed a bug in handling intra-assignment delays if
a function is present on the RHS of an AssignW.
There is more work to be done towards #6280, and potentially simplifying
AssignW handing, but this is the minimal change required to tick it off
the TODO list for #6280.