Extend the decoder-pattern case optimization to selectors that are too
wide for a full 2^width lookup table. A decoder-pattern case (where
every case item assigns constants to a fixed set of LHSs) is lowered to
a new AstMachMasked expression. AstMachMasked is emitted as a run-time
VL_MATCHMASKEd_* function call. It contains a packed constant pool table,
'matchp', which is a list of '(mask, bits)' pairs. At runtime, the index of the
first matching entry is returned, and is used to index a value table. This single
(albeit complicated) expression can replace large if-else trees whole, resulting
in much more compact code with fewer static hard to predict branches. It
is worth about 10% speed and 30% code size in some designs.
Example:
```systemverilog
logic [39:0] sel;
always_comb
casez (sel)
40'b???????????????????????????????????????1: out = 8'h01;
40'b??????????????????????????????????????1?: out = 8'h02;
40'b?????????????????????????????????????1??: out = 8'h03;
default: out = 8'hff;
endcase
```
is compiled to:
```c++
out = TABLE_value[VL_MATCHMASKED_Q(sel, CONST_match)];
```
Where 'CONST_match' contains 4 entries, of a 40-bit mask and 40-bit bit
pattern each, and 'TABLE_value' contains 4 entries of the corresponding
8-bit results. (Entries are aligned to word boundaries to avoid runtime
bit swizzling)
Teach DFG about CReset. This is not so much to optimize CReset itself, but to enable synthesizing logic involving CReset, which does appear with automatic variables used only in certain branches
Explicitly annotate those AstNodeExpr subclasses that should have a
corresponding DfgVertex subclass generated by astgen. This avoids having
to tweak things in Dfg when adding new AstNode subclasses, which can
then be handled separately.
Introduce a new DfgAstRd vertex, which holds an AstNodeExpr that is a
reference to a variable. This enables tracking all read references in
Dfg, which both enables more optimization, and allows inlining of
expressions in place of the reference more intelligently (e.g, when the
expression is only used once, and is not in a loop). This can get rid of
20-30% of temporary variables introduced in Dfg in some designs. Note
V3Gate later got rid of a lot of those, this is a step towards making
V3Gate redundant. The more intelligent expression inlining is worth ~10%
runtime speed on some designs.
Added a mini type system for Dfg using DfgDataType to replace Dfg's use
of AstNodeDType. This is much more restricted and represents only the
types Dfg can handle in a canonical form. This will be needed when
adding more support for unpacked arrays and maybe unpacked structs one
day.
Also added an internal type checker for DfgGraphs which encodes all the
assumptions the code makes about type relationships in the graph. Run
this in a few places with --debug-check. Fix resulting fallout.
Parts of this algorithm were distributed over many files some
masquerading as re-usable APIs, they were not. Move everything into one
file and avoid unnecessary virtual functions.