Extend the decoder-pattern case optimization to selectors that are too
wide for a full 2^width lookup table. A decoder-pattern case (where
every case item assigns constants to a fixed set of LHSs) is lowered to
a new AstMachMasked expression. AstMachMasked is emitted as a run-time
VL_MATCHMASKEd_* function call. It contains a packed constant pool table,
'matchp', which is a list of '(mask, bits)' pairs. At runtime, the index of the
first matching entry is returned, and is used to index a value table. This single
(albeit complicated) expression can replace large if-else trees whole, resulting
in much more compact code with fewer static hard to predict branches. It
is worth about 10% speed and 30% code size in some designs.
Example:
```systemverilog
logic [39:0] sel;
always_comb
casez (sel)
40'b???????????????????????????????????????1: out = 8'h01;
40'b??????????????????????????????????????1?: out = 8'h02;
40'b?????????????????????????????????????1??: out = 8'h03;
default: out = 8'hff;
endcase
```
is compiled to:
```c++
out = TABLE_value[VL_MATCHMASKED_Q(sel, CONST_match)];
```
Where 'CONST_match' contains 4 entries, of a 40-bit mask and 40-bit bit
pattern each, and 'TABLE_value' contains 4 entries of the corresponding
8-bit results. (Entries are aligned to word boundaries to avoid runtime
bit swizzling)
Add an overload that takes `const char*` format string. This avoids
having to construct/destruct a temporary `std::string` at the call site
when calling these functions with a string literal, which is fairly
common. This is worth 25% code size on some assertion heavy design.
Also use `std::string::clear()` instead of assigning an empty string
which is more efficient.
Change WDataInP/WDataOutP to be opaque handles types instead of aliases
to raw pointers. This subsequently eliminates needing an implicit cast
operator in VlWide, which is replaced with implicit constructors of
WDataInP/WDataOutP that can create a handle from a VlWide. This
eliminates some unsafe conversions that the previous implicit cast
operator unintentionally enabled (e.g. #7618). It also eliminates
having to insert ".data()" in various places int he generated code, which
simplifies internals (the only place ".data()" should be needed is in
calls to variadic functions where the expected type of the argument is
not WDataInP/WDataOutP).
The handles otherwise behave like pointers, implementing the minimal
amount of operators required to code the runtime. The handle is still
only a single pointer, and will be passed in registers as before, so
this patch should be performance neutral.
As part of this removed WData, which used to be an alias for EData.
All uses are now either EData*, WDataInP, WDataOutP, or VlWide directly.
Removed the VlTriggerVec type, and refactored to use an unpacked array
of 64-bit words instead. This means the trigger vector and its
operations are now the same as for any other unpacked array. The few
special functions required for operating on a trigger vector are now
generated in V3SchedTrigger as regular AstCFunc if needed.
No functional change intended, performance should be the same.