Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.
Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).
The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.
There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
for suspending a stack of coroutines (normally, C++ coroutines are
stackless).
There is a new visitor in `V3Timing.cpp` which:
* scales delays according to the timescale,
* simplifies intra-assignment timing controls and net delays into
regular timing controls and assignments,
* simplifies wait statements into loops with event controls,
* marks processes and tasks with timing controls in them as
suspendable,
* creates delay, trigger scheduler, and fork sync variables,
* transforms timing controls and fork joins into C++ awaits
There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.
There is also a function that transforms forked processes into separate
functions.
See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
Fix compile error for queue method usage, if it is the
first statement in a block of code, and the return
value is not used. Example:
> if (foo)
> void'(bar.pop_front());
* Tests: Add a test to reproduce #3399
* Fix#3399. When reading an inout port in a module, it should refer the
original inout port, not the generated MODTEMP.
* Tests: Add a test to reproduce #3509
* Tests: Compile without tautological-compare check because bit op tree optimization is disabled in the test.
* Internals: Dedup code. No functional change is intended.
* Fix#3509.
"2'b10 == (2'b11 & {1'b0, val[0]})" and "2'b10 != (2'b11 & {1'b0, val[0]})" were
wrongly optimized to "!val[0]" and "val[0]" respectively.
Now properly optimize them to 1'b0 and 1'b1.
* Commentary
* Commentary: Update Changes
These have been 'deprecated' for 2 years and are otherwise unused except
for using a temporary placeholder value, which I have inlined with the
default value.
Also remove the now VL_TIME_STR_CONVERT utility function (and
corresponding unit tests), which have no references in any project on
GitHub.
Associative arrays that specify a wildcard index type may be indexed by
integral expressions of any size, with leading zeros removed
automatically. A natural representation for such expressions is a
string, especially that the standard explicitly specifies automatic
casts from string indices to bit vectors of equivalent size.
The automatic cast part is done implicitly by the existing type system.
A simpler way to just make this work would be to convert wildcard index
type to a string type directly in the parser code, but several new AST
classes are needed to make sure illegal method calls are detected.
The verilated data structure implementation is reused, because there is
no need for differentiating the behavior on C++ side.
Step towards a proper run-time library. Reduce the amount of ifdefs in
the implementation of offloaded tracing. There are still a very small
number of ifdefs left, which will need more careful changes in order to
keep user API compatibility.
* Tests: Add a test to reproduce #3470
* Update LSB during return path of traversal. No functional change is intended.
* Introduce LeafInfo::m_msb
* Update LeafInfo::m_msb when visitin AstCCast
* Internals: Add comment, reorder. No functional change is intended.
* Delete explicit from copy constructor to fix build error.
* Update Changes
* Internals: Remove unused parameter. No functional change is intended.
* Tests: Add explanation to t_const_opt.
* Tests: Check BitOpTree statistics in t_const_opt.
* Tests: Add a test to reproduce #3445
* Fix#3445. Don't forget LSB of frozen node in BitOpTreeOpt.
* Apply suggestions from code review
Co-authored-by: Geza Lore <gezalore@gmail.com>
V3MergeCond merges consecutive conditional `_ = cond ? _ : _` and
`if (cond) ...` statements. This patch adds an analysis and ordering
phase that moves statements with identical conditions closer to each
other, in order to enable more merging opportunities. This in turn
eliminates a lot of repeated conditionals which reduced dynamic branch
count and branch misprediction rate. Observed 6.5% improvement on
multi-threaded large designs, at the cost of less than 2% increase in
Verilation speed.
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278
Some cases of warnings about the use of blocking and non-blocking
assignments in combinational vs sequential processes were suppressed in
a way that is inconsistent with the *actual* current execution model of
Verilator. Turning these back on to, well, warn the user that these might
cause unexpected results. V5 will clean these up, but until then err on
the side of caution.
Fixes#864.
At the end of V3Param, fix up the module list to be topologically
sorted. We need to do this at the end as a later instantiation of a
recursive module might instantiate an earlier specialization, which we
cannot know until we processed everything. The rest of the compiler
depends on the module list being topologically sorted.
Fixes#3393
Rename AstNodeModule::hierName -> someInstanceName and explain that this
is only used for user messages.
Rename AstNode::locationStr -> instanceStr and simplify implementation.
In particular, do not report an instance if we can't find a reasonable
guess.
The --prof-threads option has been split into two independent options:
1. --prof-exec, for collecting verilator_gantt and other execution
related profiling data, and
2. --prof-pgo, for collecting data needed for PGO
The implementation of execution profiling is extricated from
VlThreadPool and is now a separate class VlExecutionProfiler. This means
--prof-exec can now be used for single-threaded models (though it does
not measure a lot of things just yet). For consistency VerilatedProfiler
is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are
in verilated_profiler.{h/cpp}, but can be used completely independently.
Also re-worked the execution profile format so it now only emits events
without holding onto any temporaries. This is in preparation for some
future optimizations that would be hindered by the introduction of function
locals via AstText.
Also removed the Barrier event. Clearing the profile buffers is not
notably more expensive as the profiling records are trivially
destructible.
Using the 'forceable' directive in a configuration file, or the /*
verilator forceable */ metacomment on a variable declaration will
generate additional public signals that allow the specified signals to
be forced/released from the C++ code.
- Add more tests, including for tracing.
- Apply some cleaner, more generic abstractions in the implementation.
- Use clearer AstRelease which is not an assignment.
* Tests: Add t_hier_block_sc_trace(fst|vcd) that tests tracing hierarchical block on SystemC.
* Add a check that elaboration is done before a trace file is opened.
* Add a check that elaboration is done before trace() is called to verilated SystemC model.
* Tests: call sc_core::sc_start(sc_core::SC_ZERO_TIME) before opening a trace file
* Tests: Fix t_trace_two_sc to call sc_start before opening trace
* Use vl_fatal as suggested in PR review.
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.
This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.
Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
* Add a test to reproduce #3197
* Fix#3197. Optimize correctly even if a variable is >32
* Quick exit instead of continue. No functional change is intended.
* Delete comment-out line.
* update per review comment
IEEE 1800-2017 6.11.3 says these types are unsigned. Until now these
types were treated as not having a signedness (NOSIGN), and nodes having
these types were later resolved by V3Width to be unsigned. This is a bit
problematic when creating nodes of these types after V3Width. Treating
these types as unsigned from the get go is fine, and actually improves
generated code slightly.
* Add a test to reproduce bug3182. Run the same HDL with -Oo to confirm the result is same.
* Hopefully fix#3182. The result can be 0 only when polarity is true (no AstNot is found during traversal).
* add tests to reproduce #3177.
Any random test circuits can be added to t_split_var_4.v later because it uses CRC to check the result while
t_split_var_0.v has just barrel shifters.
* Fix#3177. Don't merge assign statements if a variable is marked split_var.
- Merge AstNodeIf nodes as well (not just assignment from AstCond)
- Merge merged results recursively (optimizes nested conditionals/ifs)
- Only checking mergeability once per node.
- Don't add redundant masking
- Duplicate cheap statements in both branches, if doing so yields a
larger merge
- Include reduced nodes before the starting conditional in the merge
- Remove redundant casting
- Cheaper final XOR parity flip (~/^1 instead of != 0)
- Support XOR reduction of ~XOR nodes
- Don't add redundant masking of terms
- Support unmasked terms
- Add cheaper implementation for single bit only terms
- Ensure result is clean under all circumstances (this fixes current bugs)
Verilator should now correctly re-evaluate any logic that depends on
state set in a DPI exported function, including if the DPI export is
called outside eval, or if the DPI export is called from a DPI import.
Whenever the design contains a DPI exported function that sets a
non-local variable, we create a global __Vdpi_export_trigger flag, that
is set in the body of the DPI export, and make all variables set in any
DPI exported functions dependent on this flag (this ensures correct
ordering and change detection on state set in DPI exports when needed).
The DPI export trigger flag is cleared at the end of eval, which ensured
calls to DPI exports outside of eval are detected. Additionally the
ordering is modifies to assume that any call to a 'context' DPI import
might call DPI exports by adding an edge to the ordering graph from the
logic vertex containing the call to the DPI import to the DPI export
trigger variable vertex (note the standard does not allow calls to DPI
exports from DPI imports that were not imported with 'context', so we
do not enforce ordering on those).
- Use C++11 initialization syntax
- Use C++11 for loops
- Add const
- Factor out repeated _->fileline() sub-expressions
- Factor out issuing warning message
No functional change.
* EmitXml: Added <ccall>, <constpool>, <initarray>/<inititem>, wrapped children of <if> and <while> with <begin> elements to prevent ambiguity
* EmitXml: added signed="true" to signed basicdtypes
All parameters that are required in the output are now emitted as
'static constexpr, except for string or array of strings parameters,
which are still emitted as 'static const' (required as std::string is
not a literal type, so cannot be constexpr). This simplifies handling
of parameters and supports 'real' parameters.
This patch partitions AstCFuncs under an AstNodeModule based on which
header files they require for their implementation, and emits them
into separate files based on the distinct dependency sets. This helps
with incremental recompilation of the output C++.
A separate V3VariableOrder pass is now used to order module variables
before Emit. All variables are now ordered together, without
consideration for whether they are ports, signals form the design, or
additional internal variables added by Verilator (which used to be
ordered and emitted as separate groups in Emit). For single threaded
models, this is performance neutral. For multi-threaded models, the
MTask affinity based sorting was slightly modified, so variables with no
MTask affinity are emitted last, otherwise the MTask affinity sets are
sorted using the TSP sorter as before, but again, ports, signals, and
internal variables are not differentiated. This yields a 2%+ speedup for
the multithreaded model on OpenTitan.
The -G option now correctly parses simple integer literals as signed
numbers, which is in line with the standard and is significant when
overriding parameters without a type specifier.
Fixes#3060
* Move MTaskState to ThreadSchedule
MTaskState does not concern itself with sandbagging, and thus solely contains information related to the finalized schedule, i.e., completion time, thread ID and next MTask on thread.
* Add .dot graph visualization of ThreadSchedule
Follow-up to #2779.
This commit adds the creation of .dot files - used by GraphViz - to visualize how mtasks are statically scheduled across the set of specified threads.
We visualize each thread as a row, with nodes of a row being the mtasks scheduled for the given thread. The width of the mtask nodes are proportional to their cost. MTask dependencies are shown using an edge between the source and sink mtasks.
Tests used to silently pass when vcddiff aborted. Now fixed. Updated
large array trace reference files for FST, added same reference files
for VCD.
Developers need to update their local vcddiff.
We incorrectly treated two different struct types the same when passed
as an actual parameter to a `parameter type` parameter in an instance,
if the actual parameter expression both hash to the same value and the
structs have the same struct name. This is now corrected.
Fixes#3055.
This commit adds the '--simbenchmark' option to the regression test compile command.
The option is not intended as a fully-fledged benchmarking infrastructure, but rather a
utility for easily generating cycle- and execution time information when executing a verilated test.
As an example use case, the included test file shows how optimization level is varied across
three different builds+simulations, with the statistics for each run output to the same file in
the output directory.
Future work:
- 'sim_time' in the generated top-level main file should be a parameter.
- Given the above, the test execution script from verilog-sim-benchmark can be integrated
to generate better estimates of cycles/second through varying 'sim_time' over multiple executions.
This patch implements #3032. Verilator creates a module representing the
SystemVerilog $root scope (V3LinkLevel::wrapTop). Until now, this was
called the "TOP" module, which also acted as the user instantiated model
class. Syms used to hold a pointer to this root module, but hold
instances of any submodule. This patch renames this root scope module
from "TOP" to "$root", and introduces a separate model class which is
now an interface class. As the root module is no longer the user
interface class, it can now be made an instance of Syms, just like any
other submodule. This allows absolute references into the root module to
avoid an additional pointer indirection resulting in a potential speedup
(about 1.5% on OpenTitan). The model class now also contains all non
design specific generated code (e.g.: eval loops, trace config, etc),
which additionally simplifies Verilator internals.
Please see the updated documentation for the model interface changes.
These are necessary to link the executables. So far we have been saved
by one of the generated headers forward declaring these functions with
extern "C", but changing that header would break these tests.
* Add a test to reproduce #3023. Also applied verilog-mode formatting.
* use unique_ptr. No functional change is intended.
* Introduce restorer that reverts changes during iterate() if failed.
The goal of this patch is to move functionality related to constructing
the thread entry points and then invoking them out of V3EmitC (and into
V3Partition). The long term goal being enabling V3EmitC to emit
functions partitioned based on header dependencies. V3EmitC having to
deal with only AstCFunc instances and no other magic will facilitate
this.
In this patch:
- We construct AstCFuncs for each thread entry point in
V3Partition::finalize and move AstMTaskBody nodes under these functions.
- Add the invocation of the threads as text statements within the
AstExecGraph, so they are still invoked where the exec graph is located.
(the entry point functions are still referenced via AstCCall or
AstAddOrCFunc, so lazy declarations of referenced functions are created
automatically).
- Explicitly handle MTask state variables (VlMTaskVertex in
verilated_threads.h) within Verilator, so no need to text bash a lot of
these any more (some text refs still remain but they are all created
next to each other within V3Partition.cpp).
The effect of all this on the emitted code should be nothing but some
identifier/ordering changes. No functional change intended.
What previously used to be per module static constants created in
V3Table and V3Prelim are now merged globally within the whole model and
emitted as part of a separate constant pool. Members of the constant
pool are global variables which are declared lazily when used (similar to
loose methods).
This patch introduces the concept of 'loose' methods, which semantically
are methods, but are declared as global functions, and are passed an
explicit 'self' pointer. This enables these methods to be declared
outside the class, only when they are needed, therefore removing the
header dependency. The bulk of the emitted model implementation now uses
loose methods.
Using the standard model Makefile, when in addition to an explicit
target, the target 'ccache-report' is also given, a summary of ccache
hits/misses during this invocation of 'make' will be prited at the end
of the build.
A few names were incorrectly mangled, which made --protect-ids produce
invalid output when certain SV class constructs were uses. Now fixed and
added a few extra tests to catch this.
V3Reloop now can roll up indexed assignments between arrays if there is a
constant offset between indices on the left and right hand sides, e.g.:
a[0] = b[2];
a[1] = b[3];
...
a[x] = b[x + 2];
* Tests: Add more case that does not match native C++ width (8, 16, 32 or 64).
* Use AstVarRef::same() instead of AstNode::sameGateTree() because the latter checks dtype in addition to scope.
AstVarRef may have different minWidth in some cases,
but the difference should be ignored in the context of bitOpTree optimization.
This support code merely adds the capability to skip over the encrypted
parts. Many models have unencrypted module interfaces with ports, and
only encrypt the critical parts.
* Tests: Add another testcase that triggers assertion failure in bitOpTree opt.
* Fix assertion failure in bitOpTree opt reported in #2891. Consider the follwoing case.
CCast -> WordSel -> VarRef(leaf)
* Make sure that m_bitPolarity is expanded enough.
The intention was to not merge impure assignments, but the actual
predicate failed if the assignment was indeed pure.
This fix gains 1.5% speed on SweRV EH1.
This will still warn if a case item is completely covered by previous
items, but will no longer complain about overlaps like this:
priority casez (foo_i)
2'b ?1: bar_o = 3'd0;
2'b 1?: bar_o = 3'd1;
Before, there was a warning for the second statement because the first
two patterns match 2'b11.
Add V3OptionsParser that can suggest correct option.
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
Co-authored-by: github action <action@example.com>
** Add simulation context (VerilatedContext) to allow multiple fully independent
models to be in the same process. Please see the updated examples.
** Add context->time() and context->timeInc() API calls, to set simulation time.
These now are recommended in place of the legacy sc_time_stamp().
* Tests:Add some more signals to t_const_opt_red.v
* Optimize bit op trees such as ~a[2] & a[1] & ~a[0] to 3'b010 == (3'111 & a)
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Apply clang-format
* Don't edit and-or tree.
* Call matchBitOpTree() after V3Expand does its job.
* Internals: Rename newNodep -> newp. No functional change is intended.
* Internals: Remove m_sels. No functional change is intended.
* Internals: Remove stringstream. No functional change is intended.
* Internals: Use V3Number::bitIs1(). No functional change is intended.
* Internals: Use V3Number instead of std::map. Resolved laek. Result should be same.
* Internals: Resolve overload of setPolarity. No functional change is intended.
* Internals: Pass failure reason. No functional change is intended.
* Internals: Add VNUser::toPtr(). No functional change is intended.
* Internals: Use user4 instead of std::map. No functional change is intended.
* Catch up with the AST style aftre V3Expand
* Internals: Rename Context to VarInfo. No functional change is intended.
* Add some more test case
* tests:Add stats to tests
Update stats in t_merge_cond.pl as matchBitopTree does some of them.
* insert CCast if necessary
* small optimization to remove redundant bit mask
* No quick exit even when unoptimizable node is found.
* Simplify removing redundant And
* simplify
* Revert "Internals: Add VNUser::toPtr(). No functional change is intended."
This reverts commit f98dce10db.
* Consider AstWordSel and cleanup
* Update test
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Update src/V3Const.cpp
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Apply clang-format
* Internals: rename variables. No functional change is intended.
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
Co-authored-by: github action <action@example.com>
When using a "if" statement inside an always block, part of the code may
be unreachable. This can be used to avoid errors, but it generated an
error, this commit demotes this to a warning. Partly fixes#2625.
* Test hierarchical block that is explicitly set its default parameter value.
* Fix hierarchical verilation when a hierarchical block is instantiated with explicit setting of the default value.
Parameterized hierarchical block must have mangled name even when all parameters have default value,
otherwise the parameterized module will be hidden by protect-lib wrapper.
* rename variable names. No functional change is intended.
* Add a test for hierarchical verilation without timescale
* Emit timeunit in hierarchical wrapper only when it is specified in the input design or command line option.
* Update src/V3AstNodes.h
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
* Fix memory leak of t_trace_cat and t_trace_cat_renew
* Fix memory leak of t_trace_c_api
* Fix memory leak in t_trace_public_func and t_trace_public_func_vlt
* Fix memory leaks in t_flat_build (and probably more).
* Use unique_ptr in testcases
* Test that indexing into memory word fails
* VPI: don't index into memory word
Memory and memory words share a VerilatedVar, so check for memory word before attempting to index.
* Test that range iterator exhausts after 1 dimension
* Separate range iterator from range object
Range iterators should return NULL after all ranges have been returned
* Add a test to use unpacked array port in hierarchical verilation and protect-lib.
* V3EmitV supports unpacked array variables
* Can Emit local unpacked array properly
* Update golden of t_debug_emitv
* Support unpacked array port in protect-lib
* Remove t_prot_lib_unpacked_bad test as unpacked array is supported now.
* Add tests for unpacked array in DPI-C
* Add more generic parameter generator to AstNodes
* Supports multi dimensional array in DPI ( DPI argmuments <=> Verilator internal type conversion)
consider typedef in V3Task
fix export test
fix inout for scalar
support export func of time
* V3Premit does not show an error for wide words nor ArraySel
* Unnecessary pack func for unapcked array does not appear anymore
* Support unpacked array in runtime header
- Add an overload for lvalue VL_CVT_PACK_STR_NN
- Allow conversion from void *
* touch up tests for codacy advices
* resolve free functions. no functional change intended.
The end iterator always points to an element past the end of the list.
When new elements are added to the back of the list, they are inserted
before the end iterator.
Instead, track the last element in the list at the start of processing
and stop after it's been processed.
This patch normalizes what the tests do before exiting. After this
change each test should call final on the top module and explicitly
free the top module object before exiting.