This patch implements #6480. All loop statements are represented using
AstLoop and AstLoopTest.
This necessitates rework of the loop unroller to handle loops of
arbitrary form. To enable this, I have split the old unroller used for
'generate for' statements and moved it into V3Param, and subsequently
rewrote V3Unroll to handle the new representation. V3Unroll can now
unroll more complex loops, including with loop conditions containing
multiple variable references or inlined functions.
Handling the more generic code also requires some restrictions. If a
loop contains any of the following, it cannot be unrolled:
- A timing control that might suspend the loop
- A non-inlined call to a non-pure function
These constructs can change the values of variables in the loop, so are
generally not safe to unroll if they are present. (We could still unroll
if all the variables needed for unrolling are automatic, however we
don't do that right now.)
These restrictions seem ok in the benchmark suite, where the new
unroller can generally unroll many more loops than before.
Rename AstAssignAlias to AstAlias and make it derive from AstNode
instead of AstNodeStmt.
Replace AstAlias with AstAssignW in V3LinkDot::linkDotScope, which is
the last place we need to be aware of the alias construct. Using
AstAssignW dowstream enables further optimization while preserving the
same functionality.
The only use for the clocker attribute and the AstVar::isUsedClock that
is actually necessary today for correctness is to mark top level inputs
of --lib-create blocks as being (or driving) a clock signal. Correctness
of --lib-create (and hence hierarchical blocks) actually used to depend
on having the right optimizations eliminate intermediate clocks (e.g.:
V3Gate), when the top level port was not used directly in a sensitivity
list, or marking top level signals manually via --clk or the clocker
attribute. However V3Sched::partition already needs to trace through the
logic to figure out what signals might drive a sensitivity list, so it
can very easily mark all top level inputs as such.
In this patch we remove the AstVar::attrClocker and AstVar::isUsedClock
attributes, and replace them with AstVar::isPrimaryClock, automatically
set by V3Sched::partition. This eliminates all need for manual
annotation so we are deprecating the --clk/--no-clk options and the
clocker/no_clocker attributes.
This also eliminates the opportunity for any further mis-optimization
similar to #6453.
Regarding the other uses of the removed AstVar attributes:
- As of 5.000, initial edges are triggered via a separate mechanism
applied in V3Sched, so the use in V3EmitCFunc.cpp is redundant
- Also as of 5.000, we can handle arbitrary sensitivity expressions, so
the restriction on eliminating clock signals in V3Gate is unnecessary
- Since the recent change when Dfg is applied after V3Scope, it does
perform the equivalent of GateClkDecomp, so we can delete that pass.
Having many triggers still hits a bottleneck in LLVM leading to long
compile times.
Instead of setting triggers bit-wise, set them as a whole 64-bit word
when possible. This improves C++ compile times by ~4x on some large
designs and has minor run-time performance benefit.
* Refactor V3Delay for extensibility
Introduce the concept of an "NBA Scheme", which is the lowering pattern
we can use for various variables that are the targets of NBAs.
E.g.:
- ShadowVariable (old default scheme)
- FlagShared (old array set flag scheme)
- ValueQueueWhole (recent dynamic commit queue)
We now analyse all AstAssignDly before making any decisions on which
scheme to apply. We then choose a specific scheme for each variable that
is the target of an NBA, and then all NBAs targeting that variable use
the same scheme. This enables easy mix and match of schemes as needed,
while remaining consistent by design after extensions.
Output is perturbed due to node insertion order, but no functional
or performance change is intended.
Continuing the idea of decoupling the implementations of the various algorithms.
The main points:
-Move the former "processDomain" stuff, dealing with assigning combinational logic into the relevant sensitivity domains into V3OrderProcessDomains.cpp
-Move the parallel code construction in V3OrderParallel.cpp (Could combine this with some parts of V3Partition - those not called from V3Partition::finalize - but that's not for this patch).
-Move the serial code construction into V3OrderSerial.cpp
-Factored the very small common code between the parallel and serial code construction (processMoveOneLogic) into V3OrderCFuncEmitter.cpp
Pack the elements of VlTriggerVec as dense bits (instead of a 1 byte
bool per bit), and check whether they are set on a word granularity.
This effectively transforms conditions of the form `if (trig.at(0) |
trig.at(2) | trig.at(64))` into `if (trig.word(0) & 0x5 | trig.word(1) &
0x1)`. This improves OpenTitan ST by about 1%, worth more on some other
designs.
Apart from the representational changes below, this patch renames
AstNodeMath to AstNodeExpr, and AstCMath to AstCExpr.
Now every expression (i.e.: those AstNodes that represent a [possibly
void] value, with value being interpreted in a very general sense) has
AstNodeExpr as a super class. This necessitates the introduction of an
AstStmtExpr, which represents an expression in statement position, e.g :
'foo();' would be represented as AstStmtExpr(AstCCall(foo)). In exchange
we can get rid of isStatement() in AstNodeStmt, which now really always
represent a statement
Peak memory consumption and verilation speed are not measurably changed.
Partial step towards #3420
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278
IEEE 1800-2017 6.11.3 says these types are unsigned. Until now these
types were treated as not having a signedness (NOSIGN), and nodes having
these types were later resolved by V3Width to be unsigned. This is a bit
problematic when creating nodes of these types after V3Width. Treating
these types as unsigned from the get go is fine, and actually improves
generated code slightly.
* EmitXml: Added <ccall>, <constpool>, <initarray>/<inititem>, wrapped children of <if> and <while> with <begin> elements to prevent ambiguity
* EmitXml: added signed="true" to signed basicdtypes