The DFG peephole pass converts all associative trees into right leaning,
which is good for simplifying pattern recognition, but can lead to an
excessive amount of wide intermediate results being constructed for
right leaning concatenations.
Add a new pass to balance concatenation trees by trying to:
- Create VL_EDATASIZE (32-bit) sub-terms, so words can then be packed
easily afterwards
- Try to ensure the operands of a concat are roughly the same width
within a concatenation tree. This does not yield the shortest tree,
but it ensures it has many sub-nodes that are small enough to fit into
machine registers.
This can eliminate a lot of wide intermediate results, which would need
temporaries, and also increases ILP within sub-expressions (assuming the
C compiler can't figure that out itself).
This is over 2x run-time speedup on the high_perf configuration of
VeeR EH2 (which you could arguably also get with -fno-dfg, but oh well).
An ordering constraint between NBA commit blocks ('Post' logic) and the
written variable were previously added as soft constraints (cutable
edges). However these are required for correctness, so if it ever is
cut we will have incorrect simulation results.
Change these into hard constraints instead. This necessitates adding a
flag on AstVar to ignore special variables constructed during V3Delayed
that might otherwise appear as degenerate logic loops. E.g.:
if (VdlySet) {
VdlySet = 0; // <- This write to VdlySet can and must be ignored
LHS = VdlyVal;
}
No functional change, but you might get an error if this constraint was
ever violated. (Theoretically it should never be, as these variables
were inserted in a way that does not require violating these constraints
...)
* Refactor V3Delay for extensibility
Introduce the concept of an "NBA Scheme", which is the lowering pattern
we can use for various variables that are the targets of NBAs.
E.g.:
- ShadowVariable (old default scheme)
- FlagShared (old array set flag scheme)
- ValueQueueWhole (recent dynamic commit queue)
We now analyse all AstAssignDly before making any decisions on which
scheme to apply. We then choose a specific scheme for each variable that
is the target of an NBA, and then all NBAs targeting that variable use
the same scheme. This enables easy mix and match of schemes as needed,
while remaining consistent by design after extensions.
Output is perturbed due to node insertion order, but no functional
or performance change is intended.