Combined the 3 various APIs used in EmitC* passes to handle file
opening/splitting into a single one. This removes a lot of copy paste
and makes everything consistent.
All C++ file handling goes through `EmitCBaseVisitor` using the
`openNewOutputHeaderFile`, `openNewOutputSourceFile` and
`closOutputFile` methods.
To emit a new kind of file, always derive a new class from
`EmitCBaseVisitor`, and use the above APIs, they will take care of
everything else in a consistent matter.
Subsequently also removed V3OutSCFile, and instead included
verilated_sc.h (which included the systemc header itself) in the two
files that need it (the primary model header, and the root module
header).
Functional changes:
- The PCH header did not use to have a corresponding AstCFile. Now it
does, though this makes no difference in the output
- All 'slow' sources now have '__Slow' in the name automatically (the
only one missing was for the ConstPool files)
Rest of the output is identical except for the header line now being
present in all generated C++ files.
The Syms class can contain a very large number of VeriltedScope
instances if `--vpi` is used, all of which need a call to the default
constructor in the constructor of the Syms class. This can lead to very
long compilation times, even without optimization on some compilers.
To avoid the constructor calls, hold VeriltedScope via pointers in the
Syms class, and explicitly new and delete them in the Syms
constructor/destructor. These explicit new/delte can then be
automatically split up into sub functions when the Syms
constructor/destructor become large.
Regarding run-time performance, this should have no significant effect,
most interactions are either during construction/destruction of the Syms
object, or are via pointers already. The one place where we used to
refer to VerilatedScope instances is when emitting an AstScopeName for
things like $display %m. For those there will be an extra load
instruction at run-time, which should not make a big difference.
Patch 3 of 3 to fix long compile times of the Syms module in some
scenarios.
Splitting of the Syms constructor/destructor were a bit arbitrarily
enforced with some parts splitable, while others not. There was also an
issue that even if the constructor and destructor bodies were split, we
would still end up with both in the same file that was double the size of
the intended split limit.
To fix, first all statements required in the Syms constructor and
destructor are gathered into a vector, then if the total number of
statements required for both is bigger than the split limit, the
implementations are split into sub-functions, one per file, as before,
ensuring that none of the functions are bigger than the split limit.
Also add __Slow suffix to the names of the files.
Patch 2 of 3 to fix long compile times of the Syms module in some
scenarios.
In order to avoid long compile times of the Syms constructor due to
having a very large number of member constructor sto call, move to using
explicit ctor/dtor functions for all but the root VerilatedModule. The
root module needs a constructor as it has non-default-constructible
members. The other modules don't.
This is only part of the fix, as in order to avoid having a default
constructor call the VerilatedModule needs to be default constructible.
I think this is now true for modules that do not contain strings or
other non trivially constructible/destructible variables.
Patch 1 of 3 to fix long compile times of the Syms module in some
scenarios.
For handling $past and similar functions, we used to collect sampled
values of variables at the beginning of the main _eval function. If we
have many of these, this can grow _eval very large which can make C++
compilation very slow. Apply usual fix of emitting the necessary code in
a separate function and then splitting it based on size.
AstNode::unlinkFrBackWithNext is O(n) if the subject node is not the
head of the list. We sometimes want to unlink the rest of the list
starting at the node after the head (e.g.: in
V3Sched::util::splitCheck), this patch makes that O(1) as well.
AstMTaskBody is somewhat redundant and is problematic for #6280. We used
to wrap all MTasks in a CFunc before emit anyway. Now we create that
CFunc when we create the ExecMTask in V3OrderParallel, and subsequently
use the CFunc to represent the contents of the MTask. Final output and
optimizations are the same, but internals are simplified to move
towards #6280.
No functional change.
The 'act' region used to have 2 trigger vectors ('act' and 'pre'), now
it uses a single "extended" trigger vector where the top bits are what
used to be the used bits in the 'pre' trigger vector. Please see the
description above `TriggerKit`. Also move the extra triggers from the
low end to the high end in the trigger vectors.
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.
Instead of using the number of processors in the host, use the number of
processors available to the process, respecting cpu affinity
assignments. Without pthreads, fall back and use the number of
processors in the host as before.
This is now applied everywhere so runing `nuamctl -C 0-3 verilator` or
`numactl -C 0-3 Vsim` should behave as if the host has 4 cores (e.g.
like in CI jobs)
Used to fail with "can't use --exe with --lib-create", and we didn't
have any tests for it before. (The equivalent --main --exe --build
--timing works)
Add V3SchedUtil.cpp that contains common small utility functions.
Add V3SchedTrigger.cpp that contains functionality building the trigger
mechanism code.
No functional change, just code movement. Prep for some further work.
The AstIf nodes conditional on events being triggered used to be created
in V3Clock. Now it is in V3Sched*, in order to avoid having to pass
AstActive in CFunc or MTask bodies. No functional change intended, some
improved optimization due to simplifying timing triggers that were
previously missed, also fixes what seems like a bug in the original
timing commit code.
sched_forks.tree used to be dumped before sched.tree, while it's
basically after, so move transformForks in to a separate pass. Also
extract inlined visitors in V3SchedTiming.
Accessing the ports of hier_block instances directly under the current
hier_block (or top level) work just fine (the heir stub .sv has them),
and this can simplify hooking up dotted references into hier blocks:
push part of the reference under the hier block into the hier block, and
wire it to a port, then resolve the rest of the reference to the port of
the instance.
Cache already traced vertices and reuse them on subsequent traces. This
avoids a potential combinatorial explosion in the size of the resulting
circuit.
Internals: Refactor AstNodeBlock representation (#6280)
AstNodeBlock now has 2 child lists: 'declsp' to hold declarations within
the block, and 'stmtsp' to hold the procedural statements.
AstBegin is then just a simple subtype of AstNodeBlock.
AstFork is a proper superset of AstNodeBlock (and also AstBegin), and
adds 'forksp' which hold the parallel statements. Having the sequential
'stmtsp' in AstFork is required to properly implement variable
initializers in fork blocks (IEEE 1800-2023 9.3.2), this makes that
clear, while also separating the non AstNodeStmt declarations
(for #6280). The actual fork branches in 'AstFork::forkps()' are all
AstBegin nodes. This is required as lowering stages will introduce
additional statements in each parallel branch. (We used to wrap AstFork
statements into AstBegin in 3 different places, now they always are
AstBegin and this is enforced via the type checker/V3Broken).
Also fixes incorrect disabling of forked processes from within the `fork`.
- Delete 'finalsp'. It was used in one place, basically unnecessary and
safe to remove.
- Make 'argsp' a 'List[AstVar]'. This held before. It holds the function
argument and return variables.
- Replace 'intitsp' with 'varsp' and make it into 'List[AstVar]' to hold
the function local variables. This was most of its use before. The few
places we inserted statements here now moved into 'stmtsp' by
inserting at the front of the list.
Remove the large variety of ways raw "text" is represented in the Ast.
Particularly, the only thing that represents a string to be emitted in
the output is AstText.
There are 5 AstNodes that can contain AstText, and V3Emit will throw an
error if an AstText is encountered anywhere else:
- AstCStmt: Internally generated procedural statements involving raw
text.
- AstCStmtUser: This is the old AstUCStmt, renamed so it sorts next to
AstCStmt, as it's largely equivalent. We should never create this
internally unless used to represent user input. It is used for $c,
statements in the input, and for some 'systemc_* blocks.
- AstCExpr: Internally generaged expression involving raw text.
- AstCExprUser: This is the old AstUCFunc, renamed so it sorts next to
AstCExpr. It is largely equivalent, but also has more optimizations
disabled. This should never be created internally, it is only used for
$c expressions in the input.
- AstTextBlock: Use by V3ProtectLib only, to generate the hierarchical
wrappers.
Text "tracking" for indentation is always on for AstCStmt, AstCExpr, and
AstTextBlock, as these are always generated by us, and should always be
well formed.
Tracking is always off for AstCStmtUser and AstCExprUser, as these
contain arbitrary user input that might not be safe to parse for
indentation.
Remove subsequently redundant AstNodeSimpleText and AstNodeText types.
This patch also fixes incorrect indentation in emitted waveform tracing
functions, and makes the output more readable for hier block SV stubs.
With that, all raw text nodes are handled as a proper AstNodeStmt or
AstNodeExpr as required for #6280.
There was exactly one place in V3Task, handling DPI arguments when we
relied on cleanOut of AstCExpr being false for masking. Made that code
do the relevant masking via a few new run-time functions, which also
eliminates some special cases in the relevant V3Task functions.
Initial idea was to remodel AssignW as Assign under Alway. Trying that
uncovered some issues, the most difficult of them was that a delay
attached to a continuous assignment behaves differently from a delay
attached to a blocking assignment statement, so we need to keep the
knowledge of which flavour an assignment was until V3Timing.
So instead of removing AstAssignW, we always wrap it in an AstAlways,
with a special `keyword()` type. This makes it into a proper procedural
statement, which is almost equivalent to AstAssign, except for the case
when they contain a delay. We still gain the benefits of #6280 and can
simplify some code. Every AstNodeStmt should now be under an
AstNodeProcedure - which we should rename to AstProcess, or an
AstNodeFTask). As a result, V3Table can now handle AssignW for free.
Also uncovered and fixed a bug in handling intra-assignment delays if
a function is present on the RHS of an AssignW.
There is more work to be done towards #6280, and potentially simplifying
AssignW handing, but this is the minimal change required to tick it off
the TODO list for #6280.
I stared this because the emitted makefiles for hierarchical verilation
were non-deterministic (iterating unordered_map indexed by pointers).
Then I realized that the V3HierPlan is just a dependency graph encoded
in a slightly idiosyncratic way. We do have a data structure to use for
that instead.
With that the output should always be deterministic + have nicer dumps.