Add a new Dfg pass 'pushDownSel'. This will try to move selects through
a tree of concatenations in order to eliminate temporary nodes holding
intermediate concatenation results. This can get rid of a lot of
variables when packed arrays are assigned in parts (e.g. bit-wise).
We use special C++ types for ports, e.g. SystemC types in --sc mode, and native C arrays for unpacked arrays in --cc mode. These types are not substitutable for internal types, e.g. VlUnpacked, however all the runtime primitives expect internal types.
I think the intention was to use these special IO types only for top level ports, but the current implementation also uses them for the ports of all non-inlined modules. This means the output C++ will not compile if such a port is passed to a runtime primitive (e.g. array 'sort' as in the new test) or DPI import.
Changed to use the special IO types only on the top level ports.
Note these are likely still broken if attempting to invoke on a top level port (we might be saved by wrapTop, but later optimizations might eliminate the intermediary)
Re-inline ConstPool entries in V3Subst that have been expanded into
word-wise accessed by V3Expand. This enables downstream constant folding
on the word-wise expressions.
As V3Subst now understands ConstPool entries, we can also omit expanding
straight assignments with a ConstPool entry on the RHS. This allows the
C++ compiler to see the memcpy directly.
V3Expand wide SHIFTL and SHIFTR if the shift amount is know and is a
multiple of VL_EDATA_SIZE. This case results in each word requiring a
simple copy from the original, or store of a constant zero, which
subsequent V3Subst can then eliminate.
Concatenations that are only used by Sel expressions that do not consume
some bits on the edges can be narrowed to not compute the unused bits.
E.g.: `{a[4:0], b[4:0]}[5:4]` -> `{a[0], b[4]}[1:0]`
This is a superset or the PUSH_SEL_THROUGH_CONCAT DFG pattern, which is
removed.