This patch adds some abstract enums to pass to the trace decl* APIs, so
the VCD/FST specific code can be kept in verilated_{vcd,fst}_*.cc, and
removed from V3Emit*. It also reworks the generation of the trace init
functions (those that call 'decl*' for the signals) such that the scope
hierarchy is traversed precisely once during initialization, which
simplifies the FST writer. This later change also has the side effect of
fixing tracing of nested interfaces when traced via an interface
reference - see the change in the expected t_interface_ref_trace - which
previously were missed.
Some values emitted to the trace files are constant (e.g.: actual
parameter values), and never change. Previously we used to trace these
in the 'full' dumps, which also included all other truly variable
signals. This patch introduces a new generated trace function 'const',
to complement the 'full' and 'chg' flavour, and 'const' now only
contains the constant signals, while 'full' and 'chg' contain only the
truly variable signals. The generate 'full' and 'chg' trace functions
now have exactly the same shape. Note that 'const' signals are still
traced using the 'full*' dump methods of the trace buffers, so there is
no need for a third flavour of those.
Since we removed --threads 0 support, the 'threads()' option always
returns a value >= 1. Remove corresponding dead code.
Some of the coverage counters appear to use atomics even if the model is
single threaded. I'm under the impression this was a bug originally so
those ones I changed to use threads() > 1 instead.
* Add VL_ASSERT_CAPABILITY; add assumeLocked and pretendUnlock to V3Mutex.
* Pass jobs as template-arguments and use std::packaged_task.
* Add and use V3ThreadPool::ScopedExclusiveAccess.
This change introduces a custom reference-counting pointer class that
allows creating such pointers from 'this'. This lets us keep the
receiver object around even if all references to it outside of a class
method no longer exist. Useful for coroutine methods, which may outlive
all external references to the object.
The deletion of objects is deferred until the next time slot. This is to
make clearing the triggered flag on named events in classes safe
(otherwise freed memory could be accessed).
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
special case of `--dumpi-<tag>` where tag can be a magic word, or a
filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.
Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).
The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.
There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
for suspending a stack of coroutines (normally, C++ coroutines are
stackless).
There is a new visitor in `V3Timing.cpp` which:
* scales delays according to the timescale,
* simplifies intra-assignment timing controls and net delays into
regular timing controls and assignments,
* simplifies wait statements into loops with event controls,
* marks processes and tasks with timing controls in them as
suspendable,
* creates delay, trigger scheduler, and fork sync variables,
* transforms timing controls and fork joins into C++ awaits
There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.
There is also a function that transforms forked processes into separate
functions.
See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.
This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.
This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278