Splitting of the Syms constructor/destructor were a bit arbitrarily
enforced with some parts splitable, while others not. There was also an
issue that even if the constructor and destructor bodies were split, we
would still end up with both in the same file that was double the size of
the intended split limit.
To fix, first all statements required in the Syms constructor and
destructor are gathered into a vector, then if the total number of
statements required for both is bigger than the split limit, the
implementations are split into sub-functions, one per file, as before,
ensuring that none of the functions are bigger than the split limit.
Also add __Slow suffix to the names of the files.
Patch 2 of 3 to fix long compile times of the Syms module in some
scenarios.
The 'act' region used to have 2 trigger vectors ('act' and 'pre'), now
it uses a single "extended" trigger vector where the top bits are what
used to be the used bits in the 'pre' trigger vector. Please see the
description above `TriggerKit`. Also move the extra triggers from the
low end to the high end in the trigger vectors.
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.
Remove the large variety of ways raw "text" is represented in the Ast.
Particularly, the only thing that represents a string to be emitted in
the output is AstText.
There are 5 AstNodes that can contain AstText, and V3Emit will throw an
error if an AstText is encountered anywhere else:
- AstCStmt: Internally generated procedural statements involving raw
text.
- AstCStmtUser: This is the old AstUCStmt, renamed so it sorts next to
AstCStmt, as it's largely equivalent. We should never create this
internally unless used to represent user input. It is used for $c,
statements in the input, and for some 'systemc_* blocks.
- AstCExpr: Internally generaged expression involving raw text.
- AstCExprUser: This is the old AstUCFunc, renamed so it sorts next to
AstCExpr. It is largely equivalent, but also has more optimizations
disabled. This should never be created internally, it is only used for
$c expressions in the input.
- AstTextBlock: Use by V3ProtectLib only, to generate the hierarchical
wrappers.
Text "tracking" for indentation is always on for AstCStmt, AstCExpr, and
AstTextBlock, as these are always generated by us, and should always be
well formed.
Tracking is always off for AstCStmtUser and AstCExprUser, as these
contain arbitrary user input that might not be safe to parse for
indentation.
Remove subsequently redundant AstNodeSimpleText and AstNodeText types.
This patch also fixes incorrect indentation in emitted waveform tracing
functions, and makes the output more readable for hier block SV stubs.
With that, all raw text nodes are handled as a proper AstNodeStmt or
AstNodeExpr as required for #6280.
This patch implements #6480. All loop statements are represented using
AstLoop and AstLoopTest.
This necessitates rework of the loop unroller to handle loops of
arbitrary form. To enable this, I have split the old unroller used for
'generate for' statements and moved it into V3Param, and subsequently
rewrote V3Unroll to handle the new representation. V3Unroll can now
unroll more complex loops, including with loop conditions containing
multiple variable references or inlined functions.
Handling the more generic code also requires some restrictions. If a
loop contains any of the following, it cannot be unrolled:
- A timing control that might suspend the loop
- A non-inlined call to a non-pure function
These constructs can change the values of variables in the loop, so are
generally not safe to unroll if they are present. (We could still unroll
if all the variables needed for unrolling are automatic, however we
don't do that right now.)
These restrictions seem ok in the benchmark suite, where the new
unroller can generally unroll many more loops than before.
Rename AstAssignAlias to AstAlias and make it derive from AstNode
instead of AstNodeStmt.
Replace AstAlias with AstAssignW in V3LinkDot::linkDotScope, which is
the last place we need to be aware of the alias construct. Using
AstAssignW dowstream enables further optimization while preserving the
same functionality.
The only use for the clocker attribute and the AstVar::isUsedClock that
is actually necessary today for correctness is to mark top level inputs
of --lib-create blocks as being (or driving) a clock signal. Correctness
of --lib-create (and hence hierarchical blocks) actually used to depend
on having the right optimizations eliminate intermediate clocks (e.g.:
V3Gate), when the top level port was not used directly in a sensitivity
list, or marking top level signals manually via --clk or the clocker
attribute. However V3Sched::partition already needs to trace through the
logic to figure out what signals might drive a sensitivity list, so it
can very easily mark all top level inputs as such.
In this patch we remove the AstVar::attrClocker and AstVar::isUsedClock
attributes, and replace them with AstVar::isPrimaryClock, automatically
set by V3Sched::partition. This eliminates all need for manual
annotation so we are deprecating the --clk/--no-clk options and the
clocker/no_clocker attributes.
This also eliminates the opportunity for any further mis-optimization
similar to #6453.
Regarding the other uses of the removed AstVar attributes:
- As of 5.000, initial edges are triggered via a separate mechanism
applied in V3Sched, so the use in V3EmitCFunc.cpp is redundant
- Also as of 5.000, we can handle arbitrary sensitivity expressions, so
the restriction on eliminating clock signals in V3Gate is unnecessary
- Since the recent change when Dfg is applied after V3Scope, it does
perform the equivalent of GateClkDecomp, so we can delete that pass.
Having many triggers still hits a bottleneck in LLVM leading to long
compile times.
Instead of setting triggers bit-wise, set them as a whole 64-bit word
when possible. This improves C++ compile times by ~4x on some large
designs and has minor run-time performance benefit.
* Refactor V3Delay for extensibility
Introduce the concept of an "NBA Scheme", which is the lowering pattern
we can use for various variables that are the targets of NBAs.
E.g.:
- ShadowVariable (old default scheme)
- FlagShared (old array set flag scheme)
- ValueQueueWhole (recent dynamic commit queue)
We now analyse all AstAssignDly before making any decisions on which
scheme to apply. We then choose a specific scheme for each variable that
is the target of an NBA, and then all NBAs targeting that variable use
the same scheme. This enables easy mix and match of schemes as needed,
while remaining consistent by design after extensions.
Output is perturbed due to node insertion order, but no functional
or performance change is intended.
Continuing the idea of decoupling the implementations of the various algorithms.
The main points:
-Move the former "processDomain" stuff, dealing with assigning combinational logic into the relevant sensitivity domains into V3OrderProcessDomains.cpp
-Move the parallel code construction in V3OrderParallel.cpp (Could combine this with some parts of V3Partition - those not called from V3Partition::finalize - but that's not for this patch).
-Move the serial code construction into V3OrderSerial.cpp
-Factored the very small common code between the parallel and serial code construction (processMoveOneLogic) into V3OrderCFuncEmitter.cpp
Pack the elements of VlTriggerVec as dense bits (instead of a 1 byte
bool per bit), and check whether they are set on a word granularity.
This effectively transforms conditions of the form `if (trig.at(0) |
trig.at(2) | trig.at(64))` into `if (trig.word(0) & 0x5 | trig.word(1) &
0x1)`. This improves OpenTitan ST by about 1%, worth more on some other
designs.
Apart from the representational changes below, this patch renames
AstNodeMath to AstNodeExpr, and AstCMath to AstCExpr.
Now every expression (i.e.: those AstNodes that represent a [possibly
void] value, with value being interpreted in a very general sense) has
AstNodeExpr as a super class. This necessitates the introduction of an
AstStmtExpr, which represents an expression in statement position, e.g :
'foo();' would be represented as AstStmtExpr(AstCCall(foo)). In exchange
we can get rid of isStatement() in AstNodeStmt, which now really always
represent a statement
Peak memory consumption and verilation speed are not measurably changed.
Partial step towards #3420
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278
IEEE 1800-2017 6.11.3 says these types are unsigned. Until now these
types were treated as not having a signedness (NOSIGN), and nodes having
these types were later resolved by V3Width to be unsigned. This is a bit
problematic when creating nodes of these types after V3Width. Treating
these types as unsigned from the get go is fine, and actually improves
generated code slightly.
* EmitXml: Added <ccall>, <constpool>, <initarray>/<inititem>, wrapped children of <if> and <while> with <begin> elements to prevent ambiguity
* EmitXml: added signed="true" to signed basicdtypes