AstNode::unlinkFrBackWithNext is O(n) if the subject node is not the
head of the list. We sometimes want to unlink the rest of the list
starting at the node after the head (e.g.: in
V3Sched::util::splitCheck), this patch makes that O(1) as well.
AstMTaskBody is somewhat redundant and is problematic for #6280. We used
to wrap all MTasks in a CFunc before emit anyway. Now we create that
CFunc when we create the ExecMTask in V3OrderParallel, and subsequently
use the CFunc to represent the contents of the MTask. Final output and
optimizations are the same, but internals are simplified to move
towards #6280.
No functional change.
The 'act' region used to have 2 trigger vectors ('act' and 'pre'), now
it uses a single "extended" trigger vector where the top bits are what
used to be the used bits in the 'pre' trigger vector. Please see the
description above `TriggerKit`. Also move the extra triggers from the
low end to the high end in the trigger vectors.
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.
Instead of using the number of processors in the host, use the number of
processors available to the process, respecting cpu affinity
assignments. Without pthreads, fall back and use the number of
processors in the host as before.
This is now applied everywhere so runing `nuamctl -C 0-3 verilator` or
`numactl -C 0-3 Vsim` should behave as if the host has 4 cores (e.g.
like in CI jobs)
Used to fail with "can't use --exe with --lib-create", and we didn't
have any tests for it before. (The equivalent --main --exe --build
--timing works)
Add V3SchedUtil.cpp that contains common small utility functions.
Add V3SchedTrigger.cpp that contains functionality building the trigger
mechanism code.
No functional change, just code movement. Prep for some further work.
The AstIf nodes conditional on events being triggered used to be created
in V3Clock. Now it is in V3Sched*, in order to avoid having to pass
AstActive in CFunc or MTask bodies. No functional change intended, some
improved optimization due to simplifying timing triggers that were
previously missed, also fixes what seems like a bug in the original
timing commit code.
sched_forks.tree used to be dumped before sched.tree, while it's
basically after, so move transformForks in to a separate pass. Also
extract inlined visitors in V3SchedTiming.
Accessing the ports of hier_block instances directly under the current
hier_block (or top level) work just fine (the heir stub .sv has them),
and this can simplify hooking up dotted references into hier blocks:
push part of the reference under the hier block into the hier block, and
wire it to a port, then resolve the rest of the reference to the port of
the instance.
Cache already traced vertices and reuse them on subsequent traces. This
avoids a potential combinatorial explosion in the size of the resulting
circuit.
Internals: Refactor AstNodeBlock representation (#6280)
AstNodeBlock now has 2 child lists: 'declsp' to hold declarations within
the block, and 'stmtsp' to hold the procedural statements.
AstBegin is then just a simple subtype of AstNodeBlock.
AstFork is a proper superset of AstNodeBlock (and also AstBegin), and
adds 'forksp' which hold the parallel statements. Having the sequential
'stmtsp' in AstFork is required to properly implement variable
initializers in fork blocks (IEEE 1800-2023 9.3.2), this makes that
clear, while also separating the non AstNodeStmt declarations
(for #6280). The actual fork branches in 'AstFork::forkps()' are all
AstBegin nodes. This is required as lowering stages will introduce
additional statements in each parallel branch. (We used to wrap AstFork
statements into AstBegin in 3 different places, now they always are
AstBegin and this is enforced via the type checker/V3Broken).
Also fixes incorrect disabling of forked processes from within the `fork`.
- Delete 'finalsp'. It was used in one place, basically unnecessary and
safe to remove.
- Make 'argsp' a 'List[AstVar]'. This held before. It holds the function
argument and return variables.
- Replace 'intitsp' with 'varsp' and make it into 'List[AstVar]' to hold
the function local variables. This was most of its use before. The few
places we inserted statements here now moved into 'stmtsp' by
inserting at the front of the list.
Remove the large variety of ways raw "text" is represented in the Ast.
Particularly, the only thing that represents a string to be emitted in
the output is AstText.
There are 5 AstNodes that can contain AstText, and V3Emit will throw an
error if an AstText is encountered anywhere else:
- AstCStmt: Internally generated procedural statements involving raw
text.
- AstCStmtUser: This is the old AstUCStmt, renamed so it sorts next to
AstCStmt, as it's largely equivalent. We should never create this
internally unless used to represent user input. It is used for $c,
statements in the input, and for some 'systemc_* blocks.
- AstCExpr: Internally generaged expression involving raw text.
- AstCExprUser: This is the old AstUCFunc, renamed so it sorts next to
AstCExpr. It is largely equivalent, but also has more optimizations
disabled. This should never be created internally, it is only used for
$c expressions in the input.
- AstTextBlock: Use by V3ProtectLib only, to generate the hierarchical
wrappers.
Text "tracking" for indentation is always on for AstCStmt, AstCExpr, and
AstTextBlock, as these are always generated by us, and should always be
well formed.
Tracking is always off for AstCStmtUser and AstCExprUser, as these
contain arbitrary user input that might not be safe to parse for
indentation.
Remove subsequently redundant AstNodeSimpleText and AstNodeText types.
This patch also fixes incorrect indentation in emitted waveform tracing
functions, and makes the output more readable for hier block SV stubs.
With that, all raw text nodes are handled as a proper AstNodeStmt or
AstNodeExpr as required for #6280.
There was exactly one place in V3Task, handling DPI arguments when we
relied on cleanOut of AstCExpr being false for masking. Made that code
do the relevant masking via a few new run-time functions, which also
eliminates some special cases in the relevant V3Task functions.
Initial idea was to remodel AssignW as Assign under Alway. Trying that
uncovered some issues, the most difficult of them was that a delay
attached to a continuous assignment behaves differently from a delay
attached to a blocking assignment statement, so we need to keep the
knowledge of which flavour an assignment was until V3Timing.
So instead of removing AstAssignW, we always wrap it in an AstAlways,
with a special `keyword()` type. This makes it into a proper procedural
statement, which is almost equivalent to AstAssign, except for the case
when they contain a delay. We still gain the benefits of #6280 and can
simplify some code. Every AstNodeStmt should now be under an
AstNodeProcedure - which we should rename to AstProcess, or an
AstNodeFTask). As a result, V3Table can now handle AssignW for free.
Also uncovered and fixed a bug in handling intra-assignment delays if
a function is present on the RHS of an AssignW.
There is more work to be done towards #6280, and potentially simplifying
AssignW handing, but this is the minimal change required to tick it off
the TODO list for #6280.
I stared this because the emitted makefiles for hierarchical verilation
were non-deterministic (iterating unordered_map indexed by pointers).
Then I realized that the V3HierPlan is just a dependency graph encoded
in a slightly idiosyncratic way. We do have a data structure to use for
that instead.
With that the output should always be deterministic + have nicer dumps.
All code is built as C++ via CXX, but we still have some references to
CC. Trying to make sure we don't add plain C later by hiding the C
compiler. (So it's always enough to override CXX=... in configure)
This patch gets rid of over 80% of temporary dynamic memory allocations
(when a malloced node is immediately freed with no other malloc in
between). It also gets rid of over 20% of all calls to malloc.
It's worth ~3% average verilation speed up with tcmalloc, and more
without tcmalloc.
This patch implements #6480. All loop statements are represented using
AstLoop and AstLoopTest.
This necessitates rework of the loop unroller to handle loops of
arbitrary form. To enable this, I have split the old unroller used for
'generate for' statements and moved it into V3Param, and subsequently
rewrote V3Unroll to handle the new representation. V3Unroll can now
unroll more complex loops, including with loop conditions containing
multiple variable references or inlined functions.
Handling the more generic code also requires some restrictions. If a
loop contains any of the following, it cannot be unrolled:
- A timing control that might suspend the loop
- A non-inlined call to a non-pure function
These constructs can change the values of variables in the loop, so are
generally not safe to unroll if they are present. (We could still unroll
if all the variables needed for unrolling are automatic, however we
don't do that right now.)
These restrictions seem ok in the benchmark suite, where the new
unroller can generally unroll many more loops than before.
Internals: Refactor generate construct Ast handling (#6280)
We introduce AstNodeGen, the common base class of AstGenBlock,
AstGenCase, AstGenFor, and AstGenIf, which together represent all SV
generate constructs. Subsequently remove AstNodeFor, AstNodeCase
(AstCase is now directly derived from AstNodeStmt) and adjust internals
to work on the new representation.
Output is identical modulo hashes do to changed AstNode type ids, no
functional change intended.
Step towards #6280.
Rename AstAssignAlias to AstAlias and make it derive from AstNode
instead of AstNodeStmt.
Replace AstAlias with AstAssignW in V3LinkDot::linkDotScope, which is
the last place we need to be aware of the alias construct. Using
AstAssignW dowstream enables further optimization while preserving the
same functionality.
The only use for the clocker attribute and the AstVar::isUsedClock that
is actually necessary today for correctness is to mark top level inputs
of --lib-create blocks as being (or driving) a clock signal. Correctness
of --lib-create (and hence hierarchical blocks) actually used to depend
on having the right optimizations eliminate intermediate clocks (e.g.:
V3Gate), when the top level port was not used directly in a sensitivity
list, or marking top level signals manually via --clk or the clocker
attribute. However V3Sched::partition already needs to trace through the
logic to figure out what signals might drive a sensitivity list, so it
can very easily mark all top level inputs as such.
In this patch we remove the AstVar::attrClocker and AstVar::isUsedClock
attributes, and replace them with AstVar::isPrimaryClock, automatically
set by V3Sched::partition. This eliminates all need for manual
annotation so we are deprecating the --clk/--no-clk options and the
clocker/no_clocker attributes.
This also eliminates the opportunity for any further mis-optimization
similar to #6453.
Regarding the other uses of the removed AstVar attributes:
- As of 5.000, initial edges are triggered via a separate mechanism
applied in V3Sched, so the use in V3EmitCFunc.cpp is redundant
- Also as of 5.000, we can handle arbitrary sensitivity expressions, so
the restriction on eliminating clock signals in V3Gate is unnecessary
- Since the recent change when Dfg is applied after V3Scope, it does
perform the equivalent of GateClkDecomp, so we can delete that pass.
These are no longer required for correct scheduling. They are still
accepted for backward compatibility, but have no effect on simulation
and are dropped in the front-end. Also removed the then redundant
AstAlwaysPublic class.
Fixes#6442
Added a lock around the table used to detect memory leaks. Note the only
part that is multi-threaded at the moment is V3Emit where we create
AstCFiles and the like, so the lock should be almost always uncontested.
This code is not thread safe. Specifically AstNode constructors are not
thread safe, as they may create entries in the shared Dtype table via
which can be racy.