An ordering constraint between NBA commit blocks ('Post' logic) and the
written variable were previously added as soft constraints (cutable
edges). However these are required for correctness, so if it ever is
cut we will have incorrect simulation results.
Change these into hard constraints instead. This necessitates adding a
flag on AstVar to ignore special variables constructed during V3Delayed
that might otherwise appear as degenerate logic loops. E.g.:
if (VdlySet) {
VdlySet = 0; // <- This write to VdlySet can and must be ignored
LHS = VdlyVal;
}
No functional change, but you might get an error if this constraint was
ever violated. (Theoretically it should never be, as these variables
were inserted in a way that does not require violating these constraints
...)
* Refactor V3Delay for extensibility
Introduce the concept of an "NBA Scheme", which is the lowering pattern
we can use for various variables that are the targets of NBAs.
E.g.:
- ShadowVariable (old default scheme)
- FlagShared (old array set flag scheme)
- ValueQueueWhole (recent dynamic commit queue)
We now analyse all AstAssignDly before making any decisions on which
scheme to apply. We then choose a specific scheme for each variable that
is the target of an NBA, and then all NBAs targeting that variable use
the same scheme. This enables easy mix and match of schemes as needed,
while remaining consistent by design after extensions.
Output is perturbed due to node insertion order, but no functional
or performance change is intended.
For NBAs that might execute a dynamic number of times in a single
evaluation (specifically: those that assign to array elements inside
loops), we introduce a new run-time VlNBACommitQueue data-structure
(currently a vector), which stores all pending updates and the necessary
info to reconstruct the LHS reference of the AstAssignDly at run-time.
All variables needing a commit queue has their corresponding unique
commit queue.
All NBAs to a variable that requires a commit queue go through the
commit queue. This is necessary to preserve update order in sequential
code, e.g.:
a[7] <= 10
for (int i = 1 ; i < 10; ++i) a[i] <= i;
a[2] <= 10
needs to end with array elements 1..9 being 1, 10, 3, 4, 5, 6, 7, 8, 9.
This enables supporting common forms of NBAs to arrays on the left hand
side of <= in non-suspendable/non-fork code. (Suspendable/fork
implementation is unclear to me so I left it unchanged, see #5084).
Any NBA that does not need a commit queue (i.e.: those that were
supported before), use the same scheme as before, and this patch should
have no effect on the generated code for those NBAs.
No functional change. Postpone the conversion of all AstAssignDlys that
use the 'VdlySet' scheme for array LHSs until after the complete
traversal of the netlist. The next patch takes advantage of this by
using some extra information also gathered through the traversal to
change the conversion.
AstAssignDlys inside suspendable or fork are not deferred and are
processed identical to the previous version.
There are some TODOs in this patch that are fixed in the next patch.
Output code perturbed due to variable ordering.
MULTIDRIVEN message ordering perturbed due to processing order change.
No functional change.
This patch is just cleanup with some non-functional changes to enable
the next patch. Most importantly createDlyOnSet, which implements NBAs
for arrays, has a new streamlined implementation that does the same
thing. Some output code is perturbed due to statement/local variable
insertion order.
Also renamed Vdlyvfoo to VdlyFoo for easier readability of the generated
code.
Checking the wrong node meant we never actually pushed constant
bit-select indices into the delayed update, as was the intention, but
always generated a temporary instead.
This saves about 5% memory. V3AstUserAllocator is appropriate for most use
cases, performance is marginally up as we are mostly D-cache bound on
large designs.
The typical find/if-not-exists-insert pattern can be achieved with 1
lookup instead of 2 using emplace with a sentinel value. Also maps value
initialize their values when inserted with the [] operator, this is
defined and so there is no need to explicitly insert zeroes for integer
values.
Apart from the representational changes below, this patch renames
AstNodeMath to AstNodeExpr, and AstCMath to AstCExpr.
Now every expression (i.e.: those AstNodes that represent a [possibly
void] value, with value being interpreted in a very general sense) has
AstNodeExpr as a super class. This necessitates the introduction of an
AstStmtExpr, which represents an expression in statement position, e.g :
'foo();' would be represented as AstStmtExpr(AstCCall(foo)). In exchange
we can get rid of isStatement() in AstNodeStmt, which now really always
represent a statement
Peak memory consumption and verilation speed are not measurably changed.
Partial step towards #3420
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
special case of `--dumpi-<tag>` where tag can be a magic word, or a
filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.
Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).
The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.
There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
for suspending a stack of coroutines (normally, C++ coroutines are
stackless).
There is a new visitor in `V3Timing.cpp` which:
* scales delays according to the timescale,
* simplifies intra-assignment timing controls and net delays into
regular timing controls and assignments,
* simplifies wait statements into loops with event controls,
* marks processes and tasks with timing controls in them as
suspendable,
* creates delay, trigger scheduler, and fork sync variables,
* transforms timing controls and fork joins into C++ awaits
There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.
There is also a function that transforms forked processes into separate
functions.
See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278