After conversion of Ast to Dfg, but before synthesizing AstAlways into
primitives, run a pass to remove variables that are not observable, and
all logic that only computes such variables. This can get rid of a lot
of content early so we don't build redundant Dfgs, and also enables
synthesizing always blocks that use temporaries only in some branches,
which will come in a follow up.
Many rules in the Dfg Peephole pass check if a node has more than one
sinks. Redundant variables that will ultimately be removed can prevent
these from matching. Remove such variables during the Peeophole pass
itself to enable more matches.
Note this might miss some cases where a sub-tree within an And/Or/Xor
tree is optimizeable, but not the whole tree, but in practice this seems
to work better than the alternative of keeping a set of failed nodes and
bail early.
This patch adds a heuristic to V3SplitVar, and it attempts to split up
packed variables that are only referenced via constant index,
non-overlapping bit/range selects. This can eliminate some UNOPTFLAT cases.
The DFG peephole pass converts all associative trees into right leaning,
which is good for simplifying pattern recognition, but can lead to an
excessive amount of wide intermediate results being constructed for
right leaning concatenations.
Add a new pass to balance concatenation trees by trying to:
- Create VL_EDATASIZE (32-bit) sub-terms, so words can then be packed
easily afterwards
- Try to ensure the operands of a concat are roughly the same width
within a concatenation tree. This does not yield the shortest tree,
but it ensures it has many sub-nodes that are small enough to fit into
machine registers.
This can eliminate a lot of wide intermediate results, which would need
temporaries, and also increases ILP within sub-expressions (assuming the
C compiler can't figure that out itself).
This is over 2x run-time speedup on the high_perf configuration of
VeeR EH2 (which you could arguably also get with -fno-dfg, but oh well).