Commit Graph

29 Commits

Author SHA1 Message Date
Geza Lore 0fed5f8b3e
Avoid creating redundant vertices in V3DfgPeephole (#4944)
Add a new data-structure V3DfgCache, which can be used to retrieve
existing vertices with some given inputs vertices. Use this in
V3DfgPeephole to eliminate the creation of redundant vertices.

Overall this is performance neutral, but is in prep for some future
work.
2024-03-06 18:01:52 +00:00
Wilson Snyder e76f29e5ba Copyright year update 2024-01-01 03:19:59 -05:00
Wilson Snyder f3ae4b8786 Fix spelling 2023-11-10 23:25:53 -05:00
Wilson Snyder b5828a7ce9 Fix header order botched by clang-format in recent commit. 2023-10-18 06:37:46 -04:00
github action 770cd24f27 Apply 'make format' 2023-10-18 02:50:27 +00:00
Wilson Snyder 431bb1ed16
Support compiling Verilator with gcc/clang precompiled headers (#4579) 2023-10-17 22:49:28 -04:00
Mariusz Glebocki 28bd7e5b19
Rework multithreading handling to separate by code units that use/never use it. (#4228) 2023-09-24 22:12:23 -04:00
Geza Lore 3069860fdf Allow mismatched widths in operands of shifts in DFG
Fixes #3872.

Testing this is a bit tricky, as the front-end fixes up the operand
widths in shifts to match, and we need V3Const to introduce a mismatched
one by reducing `4'd2 ** x` (with x being 2 2-bit wide signal) to `4'd1
<< x`, but t_dfg_peephole runs with V3Const disabled exactly because it
makes it hard to write tests. Rather than fixing this one case in
V3Const (which we should do systematically at some point), I fixed DFG
to accept these just in case V3Const generates more of them. The
assertions were there only because of paranoia (as I thought these were
not possible inputs), the code otherwise works.
2023-01-22 10:55:03 +00:00
Wilson Snyder b24d7c83d3 Copyright year update 2023-01-01 10:18:39 -05:00
Larry Doolittle f27cf4c804
Commentary: Fix spelling in C++ comments (#3797) (#3798) 2022-12-02 18:46:38 -05:00
Geza Lore 508e937164 Fix tautological predicate in V3DfgPeephole
Fixes #3771
2022-11-22 15:05:34 +00:00
Geza Lore 2a3eabff73 Various Dfg performance improvements 2022-11-06 15:54:51 +00:00
Geza Lore fb9ec03c3f DfgPeephole: Use a work list driven algorithm for speed
Replace the 'run to fixed point' algorithm with a work list driven
approach. Instead of marking the graph as changed, we explicitly add
vertices to the work list, to be visited, when a vertex is changed. This
improves both memory locality (as the work list is processed in last in
first out order), and removed unnecessary visitations when only a few
nodes changes.
2022-11-05 20:31:09 +00:00
Geza Lore 0675510eb9 DFG: Fix attempted evaluation of constants wider than 32 bits
Fixes #3718
2022-10-28 17:14:19 +01:00
Geza Lore 5c65e0cfa1 Dfg: Fix incorrect folding of associative expressions with shared terms
Fixes #3679
2022-10-17 15:03:30 +01:00
Geza Lore c033a0d7c8 Optimize DfgGraph vertex storage
Vertices representing variables (DfgVertexVar) and constants (DfgConst)
are very common (40-50% of all vertices created in some large designs),
and we also need to, or can treat them specially in algorithms. Keep
these as separate lists in DfgGraph for direct access to them. This
improve verilation speed.
2022-10-08 12:46:02 +01:00
Geza Lore 90447d54d1 Make DfgConst hold V3Number directly
Remove intermediary AstConst. No functional change intended.
2022-10-08 12:46:02 +01:00
Geza Lore 29a080dd9b DFG: Special case representation of AstSel
AstSel is a ternary node, but the 'widthp' is always constant and is
hence redundant, and 'lsbp' is very often constant. As AstSel is fairly
common, we special case as a DfgSel for the constant 'lsbp', and as
'DfgMux` for the non-constant 'lsbp'.
2022-10-06 19:59:01 +01:00
Geza Lore 6fa14bf029 Speed up DfgPeephole in various ways 2022-10-06 19:59:01 +01:00
Geza Lore a83043d735 DfgPeephole: Rework folding of associative operations
Allow constant folding through adjacent nodes of all associative
operations, for example '((a & 2) & 3)' or '(3 & (2 & a))' can now be
folded into '(a & 2)' and '(2 & a)' respectively. Also improve speed of
making associative expression trees right leaning by using rotation of
the existing vertices whenever instead of allocation of new nodes.
2022-10-06 09:10:26 +01:00
Geza Lore 22fcd616aa DfgPeephole: Further restrict PUSH_REDUCTION_THROUGH_CONCAT
Only apply when there is guaranteed to be a subsequent constant folding
and elimination of some of the expression, otherwise this sometimes
interferes with the simplification of concatenations and harms overall
performance.
2022-10-06 09:10:26 +01:00
Geza Lore f87fe4c3b4 DfgPeephole: add constant folding for all integer types
Also added a testing only -fno-const-before-dfg option, as otherwise
V3Const eats up a lot of the simple inputs. A lot of the things V3Const
swallows in the simple cases can make it to DFG in complex cases, or DFG
itself can create them during optimization. In any case to save
complexity of testing DFG constant folding, we use this option to turn
off V3Const prior to the DFG passes in the relevant test.
2022-10-05 12:05:40 +01:00
Geza Lore f23f3ca907 Try to ensure DFG peephole patterns don't grow the graph
Some optimizations are only a net win if they help us remove a graph
node (or at least ensure they don't grow the graph), or yields otherwise
special logic, so try to apply them only in these cases.
2022-10-04 18:54:46 +01:00
Geza Lore 965d99f1bc DFG: Make implementation more similar to AST
Use the same style, and reuse the bulk of astgen to generate DfgVertex
related code. In particular allow for easier definition of custom
DfgVertex sub-types that do not directly correspond to an AstNode
sub-type. Also introduces specific names for the fixed arity vertices.
No functional change intended.
2022-10-04 15:49:30 +01:00
Geza Lore 84b9502af4 DFG: Add more peephole patterns 2022-10-01 16:46:58 +01:00
Geza Lore acebafcbc2 DFG: Partial support for unpacked arrays
Representation and Ast / Dfg conversions available, for element-wise
access only. Not much optimization yet (only CSE).
2022-09-29 19:00:45 +01:00
Geza Lore 4a1a2def95 DFG: make variable inlining part of the peephole optimizer
This saves some traversals and prepares us to better handle cyclic DFGs.
2022-09-29 18:40:10 +01:00
Geza Lore 17976d7401 DFG: fix REPLACE_EQ_OF_CONST_AND_CONST peephole pattern 2022-09-29 18:40:10 +01:00
Geza Lore 47bce4157d
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.

This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.

The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.

For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.

The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.

Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 16:46:22 +01:00