Remove the large variety of ways raw "text" is represented in the Ast.
Particularly, the only thing that represents a string to be emitted in
the output is AstText.
There are 5 AstNodes that can contain AstText, and V3Emit will throw an
error if an AstText is encountered anywhere else:
- AstCStmt: Internally generated procedural statements involving raw
text.
- AstCStmtUser: This is the old AstUCStmt, renamed so it sorts next to
AstCStmt, as it's largely equivalent. We should never create this
internally unless used to represent user input. It is used for $c,
statements in the input, and for some 'systemc_* blocks.
- AstCExpr: Internally generaged expression involving raw text.
- AstCExprUser: This is the old AstUCFunc, renamed so it sorts next to
AstCExpr. It is largely equivalent, but also has more optimizations
disabled. This should never be created internally, it is only used for
$c expressions in the input.
- AstTextBlock: Use by V3ProtectLib only, to generate the hierarchical
wrappers.
Text "tracking" for indentation is always on for AstCStmt, AstCExpr, and
AstTextBlock, as these are always generated by us, and should always be
well formed.
Tracking is always off for AstCStmtUser and AstCExprUser, as these
contain arbitrary user input that might not be safe to parse for
indentation.
Remove subsequently redundant AstNodeSimpleText and AstNodeText types.
This patch also fixes incorrect indentation in emitted waveform tracing
functions, and makes the output more readable for hier block SV stubs.
With that, all raw text nodes are handled as a proper AstNodeStmt or
AstNodeExpr as required for #6280.
Initial idea was to remodel AssignW as Assign under Alway. Trying that
uncovered some issues, the most difficult of them was that a delay
attached to a continuous assignment behaves differently from a delay
attached to a blocking assignment statement, so we need to keep the
knowledge of which flavour an assignment was until V3Timing.
So instead of removing AstAssignW, we always wrap it in an AstAlways,
with a special `keyword()` type. This makes it into a proper procedural
statement, which is almost equivalent to AstAssign, except for the case
when they contain a delay. We still gain the benefits of #6280 and can
simplify some code. Every AstNodeStmt should now be under an
AstNodeProcedure - which we should rename to AstProcess, or an
AstNodeFTask). As a result, V3Table can now handle AssignW for free.
Also uncovered and fixed a bug in handling intra-assignment delays if
a function is present on the RHS of an AssignW.
There is more work to be done towards #6280, and potentially simplifying
AssignW handing, but this is the minimal change required to tick it off
the TODO list for #6280.
This patch gets rid of over 80% of temporary dynamic memory allocations
(when a malloced node is immediately freed with no other malloc in
between). It also gets rid of over 20% of all calls to malloc.
It's worth ~3% average verilation speed up with tcmalloc, and more
without tcmalloc.
This patch implements #6480. All loop statements are represented using
AstLoop and AstLoopTest.
This necessitates rework of the loop unroller to handle loops of
arbitrary form. To enable this, I have split the old unroller used for
'generate for' statements and moved it into V3Param, and subsequently
rewrote V3Unroll to handle the new representation. V3Unroll can now
unroll more complex loops, including with loop conditions containing
multiple variable references or inlined functions.
Handling the more generic code also requires some restrictions. If a
loop contains any of the following, it cannot be unrolled:
- A timing control that might suspend the loop
- A non-inlined call to a non-pure function
These constructs can change the values of variables in the loop, so are
generally not safe to unroll if they are present. (We could still unroll
if all the variables needed for unrolling are automatic, however we
don't do that right now.)
These restrictions seem ok in the benchmark suite, where the new
unroller can generally unroll many more loops than before.
Internals: Refactor generate construct Ast handling (#6280)
We introduce AstNodeGen, the common base class of AstGenBlock,
AstGenCase, AstGenFor, and AstGenIf, which together represent all SV
generate constructs. Subsequently remove AstNodeFor, AstNodeCase
(AstCase is now directly derived from AstNodeStmt) and adjust internals
to work on the new representation.
Output is identical modulo hashes do to changed AstNode type ids, no
functional change intended.
Step towards #6280.
The only use for the clocker attribute and the AstVar::isUsedClock that
is actually necessary today for correctness is to mark top level inputs
of --lib-create blocks as being (or driving) a clock signal. Correctness
of --lib-create (and hence hierarchical blocks) actually used to depend
on having the right optimizations eliminate intermediate clocks (e.g.:
V3Gate), when the top level port was not used directly in a sensitivity
list, or marking top level signals manually via --clk or the clocker
attribute. However V3Sched::partition already needs to trace through the
logic to figure out what signals might drive a sensitivity list, so it
can very easily mark all top level inputs as such.
In this patch we remove the AstVar::attrClocker and AstVar::isUsedClock
attributes, and replace them with AstVar::isPrimaryClock, automatically
set by V3Sched::partition. This eliminates all need for manual
annotation so we are deprecating the --clk/--no-clk options and the
clocker/no_clocker attributes.
This also eliminates the opportunity for any further mis-optimization
similar to #6453.
Regarding the other uses of the removed AstVar attributes:
- As of 5.000, initial edges are triggered via a separate mechanism
applied in V3Sched, so the use in V3EmitCFunc.cpp is redundant
- Also as of 5.000, we can handle arbitrary sensitivity expressions, so
the restriction on eliminating clock signals in V3Gate is unnecessary
- Since the recent change when Dfg is applied after V3Scope, it does
perform the equivalent of GateClkDecomp, so we can delete that pass.
Added cppcheck-suppressions.txt in the repo root. You can add new
patterns in there instead of having to parse the XML output.
Also configure to add the -D__GNUC__ preprocessor macro, which makes it
understand UASSERT (it understands the 'noreturn' function attribute).
Added some case by case specific suppressions and fixed up other code,
especially in V3Ast*h and V3Dfg*.h, including code generated by astgen
that had some no-ops that irks cppcheck.
One thing it does not seem to like is `const` class members with default
initializers in the class. It will assume that's always the value, even
if overridden in the constructor. We had few so removed them.
With that a lot of files in `src/` are now clean or only have a handful
of issues. Therefore, I have also deleted cppcheck_filtered, and made it
produce human readable output straight to the terminal.
Regarding cleaning up the reported nits, I kind of got bored after
V3[A-E] so pausing here. Apologies for the merge conflicts.
Tested with cppcheck 2.13.0
Remove AstJumpLabel
AstJumpGo now references one if its enclosing AstJumpBlocks, and
branches straight after the referenced block.
That is:
```
JumpBlock a {
...
JumpGo(a);
...
}
// <--- the JumpGo(a) goes here
```
This is sufficient for all use cases and makes control flow much easier to
reason about. As a result, V3Const can optimize a bit more aggressively.
Second half of, and fixes#6216
Somewhat commonly, there is code out there that compares an expression (or
variable) against many different constants, e.g. a one-hot decoder:
```systemverilog
assign oneHot = {x == 3, x == 2, x == 1, x == 0};
```
If the width of the expression is sufficiently large, this can blow up
a GCC pass and take an egregious amount of memory and time to compile.
Adding a new DFG pass that will generate a cheap one-hot decoder:
to compute:
```systemverilog
wire [$bits(x)-1:0] idx = <the expression being compared many times>
reg tab [1<<$bits(x)] = '{default: 0};
reg [$bits(x)-1:0] pre = '0;
always_comb begin
tab[pre] = 0;
tab[idx] = 1;
pre = idx ; // This assignment marked to avoid a false UNOPFTLAT
end
```
We then replace the comparisons `x == CONST` with `tab[CONST]`.
This is generally performance neutral, but avoids the compile time and memory
blowup with GCC (128GB+ -> 1GB in one example).
We do not apply this if the comparisons seem to be part of a `COMPARE ?
val : COND` conditional tree, which the C++ compilers can turn into jump
tables.
This enables all XiangShan configurations from RTLMeter to now build with GCC,
so in this patch we enabled those in the nightly runs.
Having many triggers still hits a bottleneck in LLVM leading to long
compile times.
Instead of setting triggers bit-wise, set them as a whole 64-bit word
when possible. This improves C++ compile times by ~4x on some large
designs and has minor run-time performance benefit.