AstCAwait is only ever uses in statement position, so model it as a
statement. We should never ever have a coroutine that returns a value.
There is no need for it in SV, nor should we rely on it for internals.
Also reworks the fix for V3Life incorrectly constant propagating the
beforeTrig functions (#7072). The property that upsets V3Life is that
a function:
1. Is called from multiple static call sites (multiple AstCCall)
2. Reads model state directly (AstVarRef to non-locals/arguments)
Such function can only be created internally after scheduling (V3Task
throws an unsupported error on a non-inlined function that reads model
state), so added a flag to AstCFunc to mark the dangerous ones for
V3Life.
Many rules in the Dfg Peephole pass check if a node has more than one
sinks. Redundant variables that will ultimately be removed can prevent
these from matching. Remove such variables during the Peeophole pass
itself to enable more matches.
This is an attempt to generate an identical trace file scope hierarchy
both with and without -fno-inline. Primarily because it's needed for
testing in upcoming patch, but also improves consitency prior to #7001
This is primarily cleanup, but there are 2 functional changes included:
- It used to accidentally reorder bodies of AstNodeIf that were outside
an AstAlways. Now it will not touch anything outside an AstAlways.
- Removed one redundant edge from the graph which perturbs the result of
V3Graph::acyclic. This should make no difference for the actual
intended result of reordering NBAs to eliminate shadow variables.
Add a new Dfg pass 'pushDownSel'. This will try to move selects through
a tree of concatenations in order to eliminate temporary nodes holding
intermediate concatenation results. This can get rid of a lot of
variables when packed arrays are assigned in parts (e.g. bit-wise).
Use uint32_t max value instead of zero as sentinel value for a trace
code being unassigned. Prep for follow on patch.
Note the actual trace file will still start codes from one, the codes
in the model are just an offset from the base code.
We use special C++ types for ports, e.g. SystemC types in --sc mode, and native C arrays for unpacked arrays in --cc mode. These types are not substitutable for internal types, e.g. VlUnpacked, however all the runtime primitives expect internal types.
I think the intention was to use these special IO types only for top level ports, but the current implementation also uses them for the ports of all non-inlined modules. This means the output C++ will not compile if such a port is passed to a runtime primitive (e.g. array 'sort' as in the new test) or DPI import.
Changed to use the special IO types only on the top level ports.
Note these are likely still broken if attempting to invoke on a top level port (we might be saved by wrapTop, but later optimizations might eliminate the intermediary)
Re-inline ConstPool entries in V3Subst that have been expanded into
word-wise accessed by V3Expand. This enables downstream constant folding
on the word-wise expressions.
As V3Subst now understands ConstPool entries, we can also omit expanding
straight assignments with a ConstPool entry on the RHS. This allows the
C++ compiler to see the memcpy directly.
V3Expand wide SHIFTL and SHIFTR if the shift amount is know and is a
multiple of VL_EDATA_SIZE. This case results in each word requiring a
simple copy from the original, or store of a constant zero, which
subsequent V3Subst can then eliminate.
A temporary introduced by V3Premit could not be eliminated in V3Subst if
it was involved in an expression that did a write back to a
non-temporary. To enables removing these, we need to track all variables
in V3Subst, not just the ones we would consider for elimination. Note
the new implementation is marginally faster than the old one even though
it does more work. It can eliminate ~5% more of wide temporaries on some
designs. Algorithm is largely the same.
Concatenations that are only used by Sel expressions that do not consume
some bits on the edges can be narrowed to not compute the unused bits.
E.g.: `{a[4:0], b[4:0]}[5:4]` -> `{a[0], b[4]}[1:0]`
This is a superset or the PUSH_SEL_THROUGH_CONCAT DFG pattern, which is
removed.
Minor performance improvement, especially for assertions heavy code.
Strings are often used as temporaries in unlikely branches. Do not
localize them to avoid an unnecessary initialization on function entry.
* logging for the unsatisfied constraints
* Apply 'make format'
* fix teh quote error in the array indexing
* Apply 'make format'
* Len change for the hash for randomity when named assertion is used
* seperate name assertion and satisfied case
* Apply 'make format'
* simply comments and display info
* refine code and fix protect case
* format
* update display in test and .out file
* add an enable flag and warning type, add a protect_id version test and update out files
* Apply 'make format'
* simplify some comments
* update out file, ready to be merged.
* update .py file to set the hash key solid
* rename and reformate the warning message to follow the verilator style
* add a nowarn test
* Apply 'make format'
* ordering
---------
Co-authored-by: Udaya Raj Subedi <075bei047.udaya@pcampus.edu.np>
Co-authored-by: github action <action@example.com>
* Tests: Add a test whether signedness of a packed array is properly implemented.
* Fix signedness of a packed array when named type is not used.
* Fix signedness of the entire packed array.
Track the location based message/feature enable bits separately for code
and control file directives. A message/feature is disabled if disabled
either in the control file, or in code directives/metacomments. That is,
enabled only if both agree should be enabled.
Note this might miss some cases where a sub-tree within an And/Or/Xor
tree is optimizeable, but not the whole tree, but in practice this seems
to work better than the alternative of keeping a set of failed nodes and
bail early.
Now that we have an efficient algorithm to analyse which bits in a
combinational cycle are not dependent on the cycle, can simplify the
cycle fixup algorithms. Remove FixUpSelDrivers: this was a heuristic
to save on the expensive independent bits analysis, but itself can
cause a performance problem on certain inputs that result in a large
number of attempted fixups. Doing this simplifies the driver tracing
algorithm, and because we now only attempt to trace drivers that are
known to be independent of the cycles, it should always succeed...
Unless of course there is a mismatch between the independent bit
analysis ant the driver tracing algorithm. In such case (when we managed
to prove independence, but then fail to trace a driver), we will crash,
which is still easier to sv-bugpoint than a performance bug.
Fixes#6744
Forceable/externally written variables cannot be used as the canonical
result variable for a Dfg value as the variables value can be
inconsistent with its Dfg drivers (e.g. when forced).
This removes a factor N from DfgBreakCycles, by doing the necessary data
flow analysis for the entire graph up front, and resulting the result for
all subsequent cycle fixups in the current iteration.
Fixes#6731
Combined the 3 various APIs used in EmitC* passes to handle file
opening/splitting into a single one. This removes a lot of copy paste
and makes everything consistent.
All C++ file handling goes through `EmitCBaseVisitor` using the
`openNewOutputHeaderFile`, `openNewOutputSourceFile` and
`closOutputFile` methods.
To emit a new kind of file, always derive a new class from
`EmitCBaseVisitor`, and use the above APIs, they will take care of
everything else in a consistent matter.
Subsequently also removed V3OutSCFile, and instead included
verilated_sc.h (which included the systemc header itself) in the two
files that need it (the primary model header, and the root module
header).
Functional changes:
- The PCH header did not use to have a corresponding AstCFile. Now it
does, though this makes no difference in the output
- All 'slow' sources now have '__Slow' in the name automatically (the
only one missing was for the ConstPool files)
Rest of the output is identical except for the header line now being
present in all generated C++ files.
The Syms class can contain a very large number of VeriltedScope
instances if `--vpi` is used, all of which need a call to the default
constructor in the constructor of the Syms class. This can lead to very
long compilation times, even without optimization on some compilers.
To avoid the constructor calls, hold VeriltedScope via pointers in the
Syms class, and explicitly new and delete them in the Syms
constructor/destructor. These explicit new/delte can then be
automatically split up into sub functions when the Syms
constructor/destructor become large.
Regarding run-time performance, this should have no significant effect,
most interactions are either during construction/destruction of the Syms
object, or are via pointers already. The one place where we used to
refer to VerilatedScope instances is when emitting an AstScopeName for
things like $display %m. For those there will be an extra load
instruction at run-time, which should not make a big difference.
Patch 3 of 3 to fix long compile times of the Syms module in some
scenarios.
Splitting of the Syms constructor/destructor were a bit arbitrarily
enforced with some parts splitable, while others not. There was also an
issue that even if the constructor and destructor bodies were split, we
would still end up with both in the same file that was double the size of
the intended split limit.
To fix, first all statements required in the Syms constructor and
destructor are gathered into a vector, then if the total number of
statements required for both is bigger than the split limit, the
implementations are split into sub-functions, one per file, as before,
ensuring that none of the functions are bigger than the split limit.
Also add __Slow suffix to the names of the files.
Patch 2 of 3 to fix long compile times of the Syms module in some
scenarios.
In order to avoid long compile times of the Syms constructor due to
having a very large number of member constructor sto call, move to using
explicit ctor/dtor functions for all but the root VerilatedModule. The
root module needs a constructor as it has non-default-constructible
members. The other modules don't.
This is only part of the fix, as in order to avoid having a default
constructor call the VerilatedModule needs to be default constructible.
I think this is now true for modules that do not contain strings or
other non trivially constructible/destructible variables.
Patch 1 of 3 to fix long compile times of the Syms module in some
scenarios.
For handling $past and similar functions, we used to collect sampled
values of variables at the beginning of the main _eval function. If we
have many of these, this can grow _eval very large which can make C++
compilation very slow. Apply usual fix of emitting the necessary code in
a separate function and then splitting it based on size.
AstNode::unlinkFrBackWithNext is O(n) if the subject node is not the
head of the list. We sometimes want to unlink the rest of the list
starting at the node after the head (e.g.: in
V3Sched::util::splitCheck), this patch makes that O(1) as well.
AstMTaskBody is somewhat redundant and is problematic for #6280. We used
to wrap all MTasks in a CFunc before emit anyway. Now we create that
CFunc when we create the ExecMTask in V3OrderParallel, and subsequently
use the CFunc to represent the contents of the MTask. Final output and
optimizations are the same, but internals are simplified to move
towards #6280.
No functional change.
The 'act' region used to have 2 trigger vectors ('act' and 'pre'), now
it uses a single "extended" trigger vector where the top bits are what
used to be the used bits in the 'pre' trigger vector. Please see the
description above `TriggerKit`. Also move the extra triggers from the
low end to the high end in the trigger vectors.
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.