The necessary options to support C++ coroutines vary greatly between
compilers, its versions, and the standard library being used. This
patch makes the check for coroutine support more robust by adding a
declaration of a coroutine variable, instead of just including the
header. It also makes sure that the HAVE_COROUTINE and
CFG_CXXFLAGS_COROUTINES flags are always set together, and only when
coroutine support is detected.
Documentation states that minimum of all reported coverage of all signals in a line should be taken.
Previous logic would break if there were any signals with zero coverage followed by signals with
nonzero coverage - a minimum from those nonzero toggle count would be taken, disregarding zero
coverage of previous signals.
Internal-tag: [#62193]
Signed-off-by: Krzysztof Obłonczek <koblonczek@antmicro.com>
`$psprintf` is a non-standard system function present in some other
simulators, and has been rejected for standardization by IEEE because
of being basically the same as `$sformatf`.
To encourage users to fix their codebase, a warning is emitted by
default, but it gets otherwise interpreted as `$sformatf` as early as
during lexing.
Signed-off-by: Arkadiusz Kozdra <akozdra@antmicro.com>
* wording/formatting
Signed-off-by: Arkadiusz Kozdra <akozdra@antmicro.com>
---------
Signed-off-by: Arkadiusz Kozdra <akozdra@antmicro.com>
* Support 2D dynamic array initialization (#4700)
- new[] on sub arrays (as per original issue)
- Built-in methods for sub-arrays
- Initialization and literals assignmensts
- Dynamic array as an element for other arrays and queues
libcxx has removed the experimental/coroutine include file in favor of
the C++20-standard coroutine include. If the latter is available we
use it otherwise falling back to the existing experimental version (in
which case we also disable the deprecated-experimental-coroutine warning).
(See also https://reviews.llvm.org/D108697.)
operand order reversed for AstCMethodHard "neq"
interface between C-style arrays and VlUnpacked
overloads added to VlUnpacked::neq(), VlUnpacked::assign()
VlUnpacked::operator=() added
Fixes #5125
For NBAs that might execute a dynamic number of times in a single
evaluation (specifically: those that assign to array elements inside
loops), we introduce a new run-time VlNBACommitQueue data-structure
(currently a vector), which stores all pending updates and the necessary
info to reconstruct the LHS reference of the AstAssignDly at run-time.
All variables needing a commit queue has their corresponding unique
commit queue.
All NBAs to a variable that requires a commit queue go through the
commit queue. This is necessary to preserve update order in sequential
code, e.g.:
a[7] <= 10
for (int i = 1 ; i < 10; ++i) a[i] <= i;
a[2] <= 10
needs to end with array elements 1..9 being 1, 10, 3, 4, 5, 6, 7, 8, 9.
This enables supporting common forms of NBAs to arrays on the left hand
side of <= in non-suspendable/non-fork code. (Suspendable/fork
implementation is unclear to me so I left it unchanged, see #5084).
Any NBA that does not need a commit queue (i.e.: those that were
supported before), use the same scheme as before, and this patch should
have no effect on the generated code for those NBAs.
No functional change. Postpone the conversion of all AstAssignDlys that
use the 'VdlySet' scheme for array LHSs until after the complete
traversal of the netlist. The next patch takes advantage of this by
using some extra information also gathered through the traversal to
change the conversion.
AstAssignDlys inside suspendable or fork are not deferred and are
processed identical to the previous version.
There are some TODOs in this patch that are fixed in the next patch.
Output code perturbed due to variable ordering.
MULTIDRIVEN message ordering perturbed due to processing order change.
With --stats, we will print DFG pattern combinations, one per line, as
S-expressions to new stat files, together with their frequency, to aid
discovery of new peephole patterns.
This saves about 5% memory. V3AstUserAllocator is appropriate for most use
cases, performance is marginally up as we are mostly D-cache bound on
large designs.
Prior to this we failed to create implicit nets for inputs of gate
primitives, which is required by the standard (IEEE 1800-2017 6.10).
Note: outputs were covered due to being modeled as the LHS of
assignments, which do create implicit nets.
Again --prof-exec have bit-rotted a little with all the recent changes
to the structure of the generated code. This patch contains a few
improvements:
- Repalce the eval/evl_loop begin/end events with generic
section_push/section_pop events, that can be arbitrarily sprinkled
into the generate code (so long as they are matched correctly) to
measure various sections. The report then contains a nested profile
of the sections, and the VCD trace shows the section names.
- Better handling of exec graphs
- Clearer overall statistics
According to 1800-2017 36.3, 1800-2017 A.9.3, 1364-2005 20.2 and 1364-2005 A.9.3, user defined system task and function identifiers can use the same character set for the second character as all the following characters.
This API is used if the user copies the process using `fork`
and similar OS-level mechanisms. The `at_clone` member function
ensures that all model-allocated resources are re-allocated, such
that the copied child process/model can simulate correctly.
A typical allocated resource is the thread pool, which every model
has its own pool.
This makes the implementation of the detection and propagation of the
suspendable property simpler and easier to read. More importantly, there are no
more jumps around the AST with the `visit` functions, which in some cases could
result in incorrect visitor context while in the `visit` function. See the added
test, which would cause Verilator to segfault before this patch.
In testing, verilation performance was not shown to be affected by this change.
Though there is a slight performance improvement from this patch, due to adding
one more check before refreshing class member cache.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
* V3Common.cpp::makeVlToString: fix `VL_TOSTRING_W` statement generation to include width argument
* fix contribution name
* add testcase for long struct `VL_TO_STRING_W` bug
`VlNow{}` is completely unnecessary, as coroutines are always on the
heap (unless optimized out). Also fix access of var ref passed to forked processes.
Apart from the representational changes below, this patch renames
AstNodeMath to AstNodeExpr, and AstCMath to AstCExpr.
Now every expression (i.e.: those AstNodes that represent a [possibly
void] value, with value being interpreted in a very general sense) has
AstNodeExpr as a super class. This necessitates the introduction of an
AstStmtExpr, which represents an expression in statement position, e.g :
'foo();' would be represented as AstStmtExpr(AstCCall(foo)). In exchange
we can get rid of isStatement() in AstNodeStmt, which now really always
represent a statement
Peak memory consumption and verilation speed are not measurably changed.
Partial step towards #3420
In non-static contexts like class objects or stack frames, the use of
global trigger evaluation is not feasible. The concept of dynamic
triggers allows for trigger evaluation in such cases. These triggers are
simply local variables, and coroutines are themselves responsible for
evaluating them. They await the global dynamic trigger scheduler object,
which is responsible for resuming them during the trigger evaluation
step in the 'act' eval region. Once the trigger is set, they await the
dynamic trigger scheduler once again, and then get resumed during the
resumption step in the 'act' eval region.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
Also added a testing only -fno-const-before-dfg option, as otherwise
V3Const eats up a lot of the simple inputs. A lot of the things V3Const
swallows in the simple cases can make it to DFG in complex cases, or DFG
itself can create them during optimization. In any case to save
complexity of testing DFG constant folding, we use this option to turn
off V3Const prior to the DFG passes in the relevant test.
Use the same style, and reuse the bulk of astgen to generate DfgVertex
related code. In particular allow for easier definition of custom
DfgVertex sub-types that do not directly correspond to an AstNode
sub-type. Also introduces specific names for the fixed arity vertices.
No functional change intended.
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
special case of `--dumpi-<tag>` where tag can be a magic word, or a
filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.