Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Data flow graph (DFG) representation of logic
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2026-01-27 02:24:34 +01:00
|
|
|
// This program is free software; you can redistribute it and/or modify it
|
|
|
|
|
// under the terms of either the GNU Lesser General Public License Version 3
|
|
|
|
|
// or the Perl Artistic License Version 2.0.
|
|
|
|
|
// SPDX-FileCopyrightText: 2003-2026 Wilson Snyder
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// This is a data-flow graph based representation of combinational logic,
|
|
|
|
|
// the main difference from a V3Graph is that DfgVertex owns the storage
|
|
|
|
|
// of it's input edges (operands/sources/arguments), and can access each
|
|
|
|
|
// input edge directly by indexing, making modifications more efficient
|
|
|
|
|
// than the linked list based structures used by V3Graph.
|
|
|
|
|
//
|
|
|
|
|
// A bulk of the DfgVertex sub-types are generated by astgen, and are
|
2022-10-04 12:03:41 +02:00
|
|
|
// analogous to the corresponding AstNode sub-types.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//
|
|
|
|
|
// See also the internals documentation docs/internals.rst
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
|
|
|
|
#ifndef VERILATOR_V3DFG_H_
|
|
|
|
|
#define VERILATOR_V3DFG_H_
|
|
|
|
|
|
|
|
|
|
#include "config_build.h"
|
|
|
|
|
#include "verilatedos.h"
|
|
|
|
|
|
|
|
|
|
#include "V3Ast.h"
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
#include "V3Cfg.h"
|
2025-09-07 21:38:50 +02:00
|
|
|
#include "V3DfgDataType.h"
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Error.h"
|
2023-09-25 04:12:23 +02:00
|
|
|
#include "V3Global.h"
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Hash.h"
|
|
|
|
|
#include "V3List.h"
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
#include "V3Dfg__gen_forward_class_decls.h" // From ./astgen
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
#include <algorithm>
|
2022-09-25 17:03:15 +02:00
|
|
|
#include <array>
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include <functional>
|
2022-10-06 12:26:11 +02:00
|
|
|
#include <new>
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include <type_traits>
|
|
|
|
|
#include <unordered_map>
|
|
|
|
|
#include <vector>
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
#ifndef VL_NOT_FINAL
|
|
|
|
|
#define VL_NOT_FINAL // This #define fixes broken code folding in the CLion IDE
|
|
|
|
|
#endif
|
|
|
|
|
|
2025-09-02 23:21:24 +02:00
|
|
|
// Can T be stored in the memory allocted for U?
|
|
|
|
|
template <typename T, typename U>
|
|
|
|
|
inline constexpr bool fitsSpaceAllocatedFor() {
|
|
|
|
|
return sizeof(T) <= sizeof(U) && alignof(T) <= alignof(U);
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class DfgEdge;
|
2024-03-26 00:06:25 +01:00
|
|
|
class DfgVertex;
|
|
|
|
|
class DfgGraph;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class DfgVisitor;
|
2025-09-02 23:21:24 +02:00
|
|
|
template <typename T_User, bool = fitsSpaceAllocatedFor<T_User, void*>()>
|
|
|
|
|
class DfgUserMap;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
//------------------------------------------------------------------------------
|
2025-09-02 17:50:40 +02:00
|
|
|
// Dataflow graph vertex type enum
|
2022-10-04 12:03:41 +02:00
|
|
|
|
|
|
|
|
class VDfgType final {
|
|
|
|
|
public:
|
|
|
|
|
#include "V3Dfg__gen_type_enum.h" // From ./astgen
|
2025-09-03 14:56:11 +02:00
|
|
|
const en m_e;
|
2025-08-19 23:02:10 +02:00
|
|
|
VDfgType() = delete;
|
2025-09-03 14:56:11 +02:00
|
|
|
|
|
|
|
|
// VDfgType is interconvetible with VDfgType::en
|
2022-10-04 12:03:41 +02:00
|
|
|
// cppcheck-suppress noExplicitConstructor
|
|
|
|
|
constexpr VDfgType(en _e)
|
|
|
|
|
: m_e{_e} {}
|
|
|
|
|
constexpr operator en() const { return m_e; }
|
|
|
|
|
};
|
|
|
|
|
constexpr bool operator==(VDfgType lhs, VDfgType rhs) { return lhs.m_e == rhs.m_e; }
|
|
|
|
|
constexpr bool operator==(VDfgType lhs, VDfgType::en rhs) { return lhs.m_e == rhs; }
|
|
|
|
|
constexpr bool operator==(VDfgType::en lhs, VDfgType rhs) { return lhs == rhs.m_e; }
|
|
|
|
|
inline std::ostream& operator<<(std::ostream& os, const VDfgType& t) { return os << t.ascii(); }
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dataflow graph edge
|
|
|
|
|
class DfgEdge final {
|
|
|
|
|
friend class DfgVertex;
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgVertex* m_srcp = nullptr; // The source vertex driving this edge - might be unconnected
|
|
|
|
|
DfgVertex* const m_dstp; // The vertex driven by this edge, which owns this edge, so immutable
|
|
|
|
|
V3ListLinks<DfgEdge> m_links; // V3List links in the list of sinks of m_srcp
|
|
|
|
|
|
|
|
|
|
DfgEdge() = delete;
|
|
|
|
|
VL_UNCOPYABLE(DfgEdge);
|
|
|
|
|
VL_UNMOVABLE(DfgEdge);
|
|
|
|
|
|
|
|
|
|
V3ListLinks<DfgEdge>& links() { return m_links; }
|
|
|
|
|
using List = V3List<DfgEdge, &DfgEdge::links>;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
public:
|
2025-09-02 17:50:40 +02:00
|
|
|
explicit DfgEdge(DfgVertex* dstp)
|
|
|
|
|
: m_dstp{dstp} {}
|
|
|
|
|
~DfgEdge() { unlinkSrcp(); }
|
2022-09-25 17:03:15 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// The source (driver) of this edge
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgVertex* srcp() const { return m_srcp; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// The sink (consumer) of this edge
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgVertex* dstp() const { return m_dstp; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Remove driver of this edge
|
2025-09-02 17:50:40 +02:00
|
|
|
inline void unlinkSrcp();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Relink this edge to be driven from the given new source vertex
|
2025-09-02 17:50:40 +02:00
|
|
|
inline void relinkSrcp(DfgVertex* srcp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dataflow graph vertex
|
|
|
|
|
class DfgVertex VL_NOT_FINAL {
|
|
|
|
|
friend class DfgGraph;
|
|
|
|
|
friend class DfgEdge;
|
|
|
|
|
friend class DfgVisitor;
|
2025-09-02 23:21:24 +02:00
|
|
|
template <typename, bool>
|
|
|
|
|
friend class DfgUserMap;
|
2022-10-06 12:26:11 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// STATE
|
2025-09-02 17:50:40 +02:00
|
|
|
V3ListLinks<DfgVertex> m_links; // V3List links in the DfgGraph
|
|
|
|
|
std::vector<std::unique_ptr<DfgEdge>> m_inputps; // Input edges, as vector, for fast indexing
|
|
|
|
|
DfgEdge::List m_sinks; // List of sink edges of this vertex
|
2024-03-26 00:06:25 +01:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
FileLine* const m_filelinep; // Source location
|
2025-09-07 21:38:50 +02:00
|
|
|
const DfgDataType& m_dtype; // Data type of the result of this vertex
|
2022-10-06 12:26:11 +02:00
|
|
|
const VDfgType m_type; // Vertex type tag
|
2025-09-02 23:21:24 +02:00
|
|
|
|
|
|
|
|
// The only way to access thes is via DfgUserMap, so mutable is appropriate,
|
|
|
|
|
// the map can change while the keys (DfgVertex) are const.
|
|
|
|
|
mutable uint32_t m_userGeneration = 0; // User data generation number
|
|
|
|
|
mutable void* m_userStorage = nullptr; // User data storage - one pointer worth
|
|
|
|
|
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
DfgGraph* m_dfgp = nullptr; // Graph this vertex belongs to
|
|
|
|
|
#endif
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// METHODS
|
|
|
|
|
// Visitor accept method
|
|
|
|
|
virtual void accept(DfgVisitor& v) = 0;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Acessor for type List
|
2024-03-26 00:06:25 +01:00
|
|
|
V3ListLinks<DfgVertex>& links() { return m_links; }
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
// List type that can store Vertex (which must be a DfgVertex) instances via m_links
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
using List = V3List<DfgVertex, &DfgVertex::links, Vertex>;
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
protected:
|
|
|
|
|
// CONSTRUCTOR
|
2025-09-07 21:38:50 +02:00
|
|
|
DfgVertex(DfgGraph& dfg, VDfgType type, FileLine* flp, const DfgDataType& dt) VL_MT_DISABLED;
|
2025-09-02 17:50:40 +02:00
|
|
|
// Use unlinkDelete instead
|
|
|
|
|
virtual ~DfgVertex() VL_MT_DISABLED = default;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Create a new input edge and return it
|
|
|
|
|
DfgEdge* newInput() {
|
|
|
|
|
m_inputps.emplace_back(new DfgEdge{this});
|
|
|
|
|
return m_inputps.back().get();
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
public:
|
2025-09-02 17:50:40 +02:00
|
|
|
// Get input 'i'
|
|
|
|
|
DfgVertex* inputp(size_t i) const { return m_inputps[i]->srcp(); }
|
|
|
|
|
// Relink input 'i'
|
|
|
|
|
void inputp(size_t i, DfgVertex* vtxp) { m_inputps[i]->relinkSrcp(vtxp); }
|
|
|
|
|
// The number of inputs this vertex has. Some might be unconnected.
|
|
|
|
|
size_t nInputs() const { return m_inputps.size(); }
|
2026-03-21 23:45:02 +01:00
|
|
|
// Unlink all inputs and reset to no inputs - use very carefully
|
|
|
|
|
void resetInputs() { m_inputps.clear(); }
|
2025-09-02 17:50:40 +02:00
|
|
|
|
|
|
|
|
// The type of this vertex
|
|
|
|
|
VDfgType type() const { return m_type; }
|
|
|
|
|
// Source location
|
|
|
|
|
FileLine* fileline() const { return m_filelinep; }
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// The data type of the result of the vertex
|
2025-09-07 21:38:50 +02:00
|
|
|
const DfgDataType& dtype() const { return m_dtype; }
|
|
|
|
|
// Shorthands for accessors of 'dtype()'
|
|
|
|
|
bool isPacked() const { return m_dtype.isPacked(); }
|
|
|
|
|
bool isArray() const { return m_dtype.isArray(); }
|
|
|
|
|
uint32_t size() const { return m_dtype.size(); }
|
|
|
|
|
// Type check + size
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
uint32_t width() const {
|
2025-09-07 21:38:50 +02:00
|
|
|
UASSERT_OBJ(m_dtype.isPacked(), this, "Non packed vertex has no 'width'");
|
|
|
|
|
return m_dtype.size();
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-20 09:38:36 +01:00
|
|
|
// Type check vertex (for debugging)
|
|
|
|
|
void typeCheck(const DfgGraph& dfg) const;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Predicate: has 1 or more sinks
|
2025-09-02 17:50:40 +02:00
|
|
|
bool hasSinks() const { return !m_sinks.empty(); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Predicate: has 2 or more sinks
|
2025-09-02 17:50:40 +02:00
|
|
|
bool hasMultipleSinks() const { return m_sinks.hasMultipleElements(); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Fanout (number of sinks) of this vertex (expensive to compute)
|
2023-09-25 04:12:23 +02:00
|
|
|
uint32_t fanout() const VL_MT_DISABLED;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-06-16 13:25:44 +02:00
|
|
|
// Return a canonical variable vertex that holds the value of this vertex,
|
|
|
|
|
// or nullptr if no such variable exists in the graph. This is O(fanout).
|
2025-07-14 23:09:34 +02:00
|
|
|
DfgVertexVar* getResultVar() VL_MT_DISABLED;
|
2025-06-16 13:25:44 +02:00
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
// Cache type for 'scopep' below
|
|
|
|
|
using ScopeCache = std::unordered_map<const DfgVertex*, AstScope*>;
|
|
|
|
|
|
|
|
|
|
// Retrieve the prefred AstScope this vertex belongs to. For variable
|
|
|
|
|
// vertices this is defined. For operation vertices, we try to find a
|
|
|
|
|
// scope based on variables in the upstream logic cone (inputs). If
|
|
|
|
|
// there isn't one, (beceuse the whole upstream cone is constant...),
|
|
|
|
|
// then the root scope is returned. If 'tryResultVar' is true, we will
|
|
|
|
|
// condier the scope of 'getResultVar' first, if it exists.
|
|
|
|
|
// Only call this with a scoped DfgGraph
|
|
|
|
|
AstScope* scopep(ScopeCache& cache, bool tryResultVar = false) VL_MT_DISABLED;
|
|
|
|
|
|
2025-06-17 00:14:24 +02:00
|
|
|
// If the node has a single sink, return it, otherwise return nullptr
|
|
|
|
|
DfgVertex* singleSink() const {
|
2025-09-02 17:50:40 +02:00
|
|
|
return m_sinks.hasSingleElement() ? m_sinks.frontp()->dstp() : nullptr;
|
2025-06-17 00:14:24 +02:00
|
|
|
}
|
|
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// First sink of the vertex, if any, otherwise nullptr
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgVertex* firtsSinkp() { return m_sinks.empty() ? nullptr : m_sinks.frontp()->dstp(); }
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Unlink from container (graph or builder), then delete this vertex
|
2023-09-25 04:12:23 +02:00
|
|
|
void unlinkDelete(DfgGraph& dfg) VL_MT_DISABLED;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Relink all sinks to be driven from the given new source
|
2025-09-02 17:50:40 +02:00
|
|
|
void replaceWith(DfgVertex* vtxp) {
|
2025-09-07 21:38:50 +02:00
|
|
|
UASSERT_OBJ(vtxp != this, this, "Replacing DfgVertex with itself");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtype() == dtype(), this, "Replacement DfgVertex has different type");
|
2025-09-02 17:50:40 +02:00
|
|
|
while (!m_sinks.empty()) m_sinks.frontp()->relinkSrcp(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each source vertex of this vertex. If 'f'
|
|
|
|
|
// returns true, further sources are not iterated and this method returns
|
|
|
|
|
// true itself. Unconnected source edges are not iterated.
|
|
|
|
|
bool foreachSource(std::function<bool(DfgVertex&)> f) {
|
|
|
|
|
for (const std::unique_ptr<DfgEdge>& edgep : m_inputps) {
|
|
|
|
|
if (DfgVertex* const srcp = edgep->srcp()) {
|
|
|
|
|
if (f(*srcp)) return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Calls given function 'f' for each source vertex of this vertex. If 'f'
|
|
|
|
|
// returns true, further sources are not iterated and this method returns
|
|
|
|
|
// true itself. Unconnected source edges are not iterated.
|
|
|
|
|
bool foreachSource(std::function<bool(const DfgVertex&)> f) const {
|
|
|
|
|
for (const std::unique_ptr<DfgEdge>& edgep : m_inputps) {
|
|
|
|
|
if (DfgVertex* const srcp = edgep->srcp()) {
|
|
|
|
|
if (f(*srcp)) return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Calls given function 'f' for each sink vertex of this vertex. If 'f'
|
|
|
|
|
// returns true, further sinks are not iterated and this method returns
|
|
|
|
|
// true itself. Unlinking/deleting the given sink during iteration is safe,
|
|
|
|
|
// but not other sinks of this vertex.
|
|
|
|
|
bool foreachSink(std::function<bool(DfgVertex&)> f) {
|
|
|
|
|
for (const DfgEdge* const edgep : m_sinks.unlinkable()) {
|
|
|
|
|
if (f(*edgep->dstp())) return true;
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Calls given function 'f' for each sink vertex of this vertex. If 'f'
|
|
|
|
|
// returns true, further sinks are not iterated and this method returns
|
|
|
|
|
// true itself.
|
|
|
|
|
bool foreachSink(std::function<bool(const DfgVertex&)> f) const {
|
|
|
|
|
for (const DfgEdge& edge : m_sinks) {
|
|
|
|
|
if (f(*edge.dstp())) return true;
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-10 13:38:49 +02:00
|
|
|
// Is this vertex cheaper to re-compute than to load out of memoy
|
|
|
|
|
inline bool isCheaperThanLoad() const;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Methods that allow DfgVertex to participate in error reporting/messaging
|
2025-10-03 12:49:13 +02:00
|
|
|
// LCOV_EXCL_START
|
2023-08-29 22:54:32 +02:00
|
|
|
void v3errorEnd(std::ostringstream& str) const VL_RELEASE(V3Error::s().m_mutex) {
|
2023-02-10 04:15:37 +01:00
|
|
|
m_filelinep->v3errorEnd(str);
|
|
|
|
|
}
|
|
|
|
|
void v3errorEndFatal(std::ostringstream& str) const VL_ATTR_NORETURN
|
2023-08-29 22:54:32 +02:00
|
|
|
VL_RELEASE(V3Error::s().m_mutex) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
m_filelinep->v3errorEndFatal(str);
|
|
|
|
|
}
|
2023-02-10 04:15:37 +01:00
|
|
|
string warnContextPrimary() const VL_REQUIRES(V3Error::s().m_mutex) {
|
|
|
|
|
return fileline()->warnContextPrimary();
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
string warnContextSecondary() const { return fileline()->warnContextSecondary(); }
|
2023-02-10 04:15:37 +01:00
|
|
|
string warnMore() const VL_REQUIRES(V3Error::s().m_mutex) { return fileline()->warnMore(); }
|
|
|
|
|
string warnOther() const VL_REQUIRES(V3Error::s().m_mutex) { return fileline()->warnOther(); }
|
2025-10-03 12:49:13 +02:00
|
|
|
// LCOV_EXCL_STOP
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
private:
|
|
|
|
|
// For internal use only.
|
|
|
|
|
// Note: specializations for particular vertex types are provided by 'astgen'
|
|
|
|
|
template <typename T>
|
|
|
|
|
inline static bool privateTypeTest(const DfgVertex* nodep);
|
|
|
|
|
|
|
|
|
|
public:
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Subtype test
|
|
|
|
|
template <typename T>
|
|
|
|
|
bool is() const {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertex, T>::value, "'T' must be a subtype of DfgVertex");
|
2022-10-04 12:03:41 +02:00
|
|
|
return privateTypeTest<typename std::remove_cv<T>::type>(this);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Ensure subtype, then cast to that type
|
|
|
|
|
template <typename T>
|
|
|
|
|
T* as() {
|
|
|
|
|
UASSERT_OBJ(is<T>(), this,
|
|
|
|
|
"DfgVertex is not of expected type, but instead has type '" << typeName()
|
|
|
|
|
<< "'");
|
|
|
|
|
return static_cast<T*>(this);
|
|
|
|
|
}
|
|
|
|
|
template <typename T>
|
|
|
|
|
const T* as() const {
|
|
|
|
|
UASSERT_OBJ(is<T>(), this,
|
|
|
|
|
"DfgVertex is not of expected type, but instead has type '" << typeName()
|
|
|
|
|
<< "'");
|
|
|
|
|
return static_cast<const T*>(this);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Cast to subtype, or null if different
|
|
|
|
|
template <typename T>
|
|
|
|
|
T* cast() {
|
|
|
|
|
return is<T>() ? static_cast<T*>(this) : nullptr;
|
|
|
|
|
}
|
|
|
|
|
template <typename T>
|
|
|
|
|
const T* cast() const {
|
|
|
|
|
return is<T>() ? static_cast<const T*>(this) : nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Human-readable vertex type as string for debugging
|
2025-09-02 17:50:40 +02:00
|
|
|
std::string typeName() const { return m_type.ascii(); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Human-readable name for source operand with given index for debugging
|
2025-09-02 17:50:40 +02:00
|
|
|
virtual std::string srcName(size_t idx) const = 0;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// DfgVertex visitor
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class DfgVisitor VL_NOT_FINAL {
|
|
|
|
|
public:
|
|
|
|
|
// Dispatch to most specific 'visit' method on 'vtxp'
|
|
|
|
|
void iterate(DfgVertex* vtxp) { vtxp->accept(*this); }
|
2025-09-02 17:50:40 +02:00
|
|
|
// Least specific visit method is abstract
|
2022-10-04 12:03:41 +02:00
|
|
|
virtual void visit(DfgVertex* nodep) = 0;
|
|
|
|
|
#include "V3Dfg__gen_visitor_decls.h" // From ./astgen
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
// DfgVertex subclasses
|
|
|
|
|
#include "V3DfgVertices.h"
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Specializations of privateTypeTest
|
|
|
|
|
#include "V3Dfg__gen_type_tests.h" // From ./astgen
|
2022-10-04 12:03:41 +02:00
|
|
|
|
2024-03-26 00:06:25 +01:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dataflow graph
|
|
|
|
|
class DfgGraph final {
|
2025-09-02 23:21:24 +02:00
|
|
|
friend class DfgUserMapBase;
|
2024-03-26 00:06:25 +01:00
|
|
|
|
|
|
|
|
// MEMBERS
|
|
|
|
|
|
|
|
|
|
// Variables and constants make up a significant proportion of vertices (40-50% was observed
|
|
|
|
|
// in large designs), and they can often be treated specially in algorithms, which in turn
|
|
|
|
|
// enables significant Verilation performance gains, so we keep these in separate lists for
|
|
|
|
|
// direct access.
|
|
|
|
|
DfgVertex::List<DfgVertexVar> m_varVertices; // The variable vertices in the graph
|
|
|
|
|
DfgVertex::List<DfgConst> m_constVertices; // The constant vertices in the graph
|
|
|
|
|
DfgVertex::List<DfgVertex> m_opVertices; // The operation vertices in the graph
|
|
|
|
|
size_t m_size = 0; // Number of vertices in the graph
|
2025-07-01 23:55:08 +02:00
|
|
|
// Parent of the graph (i.e.: the module containing the logic represented by this graph),
|
|
|
|
|
// or nullptr when run after V3Scope
|
2024-03-26 00:06:25 +01:00
|
|
|
AstModule* const m_modulep;
|
2025-06-16 13:25:44 +02:00
|
|
|
const std::string m_name; // Name of graph - need not be unique
|
|
|
|
|
std::string m_tmpNameStub{""}; // Name stub for temporary variables - computed lazy
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-09-02 23:21:24 +02:00
|
|
|
// The only way to access thes is via DfgUserMap, so mutable is appropriate,
|
|
|
|
|
// the map can change while the graph is const.
|
|
|
|
|
mutable bool m_vertexUserInUse = false; // Vertex user data currently in use
|
|
|
|
|
mutable uint32_t m_vertexUserGeneration = 0; // Vertex user data generation counter
|
|
|
|
|
|
2024-03-26 00:06:25 +01:00
|
|
|
public:
|
|
|
|
|
// CONSTRUCTOR
|
2025-07-01 23:55:08 +02:00
|
|
|
explicit DfgGraph(AstModule* modulep, const string& name = "") VL_MT_DISABLED;
|
2024-03-26 00:06:25 +01:00
|
|
|
~DfgGraph() VL_MT_DISABLED;
|
|
|
|
|
VL_UNCOPYABLE(DfgGraph);
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
public:
|
|
|
|
|
// Number of vertices in this graph
|
|
|
|
|
size_t size() const { return m_size; }
|
2025-07-01 23:55:08 +02:00
|
|
|
// Parent module - or nullptr when run after V3Scope
|
2024-03-26 00:06:25 +01:00
|
|
|
AstModule* modulep() const { return m_modulep; }
|
|
|
|
|
// Name of this graph
|
|
|
|
|
const string& name() const { return m_name; }
|
|
|
|
|
|
2025-09-02 23:21:24 +02:00
|
|
|
// Create a new DfgUserMap
|
|
|
|
|
template <typename T_User>
|
|
|
|
|
inline DfgUserMap<T_User> makeUserMap() const;
|
2024-03-26 00:06:25 +01:00
|
|
|
|
|
|
|
|
// Access to vertex lists
|
|
|
|
|
DfgVertex::List<DfgVertexVar>& varVertices() { return m_varVertices; }
|
|
|
|
|
const DfgVertex::List<DfgVertexVar>& varVertices() const { return m_varVertices; }
|
|
|
|
|
DfgVertex::List<DfgConst>& constVertices() { return m_constVertices; }
|
|
|
|
|
const DfgVertex::List<DfgConst>& constVertices() const { return m_constVertices; }
|
|
|
|
|
DfgVertex::List<DfgVertex>& opVertices() { return m_opVertices; }
|
|
|
|
|
const DfgVertex::List<DfgVertex>& opVertices() const { return m_opVertices; }
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// Add DfgVertex to this graph (assumes not yet contained).
|
|
|
|
|
void addVertex(DfgVertex& vtx) {
|
2025-09-02 23:21:24 +02:00
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
UASSERT_OBJ(!vtx.m_dfgp, &vtx, "Vertex already in a graph");
|
|
|
|
|
#endif
|
2025-09-02 17:50:40 +02:00
|
|
|
// Note: changes here need to be replicated in DfgGraph::mergeGraphs
|
|
|
|
|
++m_size;
|
|
|
|
|
if (DfgConst* const cVtxp = vtx.cast<DfgConst>()) {
|
|
|
|
|
m_constVertices.linkBack(cVtxp);
|
|
|
|
|
} else if (DfgVertexVar* const vVtxp = vtx.cast<DfgVertexVar>()) {
|
|
|
|
|
m_varVertices.linkBack(vVtxp);
|
|
|
|
|
} else {
|
|
|
|
|
m_opVertices.linkBack(&vtx);
|
|
|
|
|
}
|
2025-09-02 23:21:24 +02:00
|
|
|
vtx.m_userGeneration = 0;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
vtx.m_dfgp = this;
|
|
|
|
|
#endif
|
2025-09-02 17:50:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Remove DfgVertex form this graph (assumes it is contained).
|
|
|
|
|
void removeVertex(DfgVertex& vtx) {
|
2025-09-02 23:21:24 +02:00
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
UASSERT_OBJ(vtx.m_dfgp == this, &vtx, "Vertex not in this graph");
|
|
|
|
|
#endif
|
2025-09-02 17:50:40 +02:00
|
|
|
// Note: changes here need to be replicated in DfgGraph::mergeGraphs
|
|
|
|
|
--m_size;
|
|
|
|
|
if (DfgConst* const cVtxp = vtx.cast<DfgConst>()) {
|
|
|
|
|
m_constVertices.unlink(cVtxp);
|
|
|
|
|
} else if (DfgVertexVar* const vVtxp = vtx.cast<DfgVertexVar>()) {
|
|
|
|
|
m_varVertices.unlink(vVtxp);
|
|
|
|
|
} else {
|
|
|
|
|
m_opVertices.unlink(&vtx);
|
|
|
|
|
}
|
2025-09-02 23:21:24 +02:00
|
|
|
vtx.m_userGeneration = 0;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
vtx.m_dfgp = nullptr;
|
|
|
|
|
#endif
|
2025-09-02 17:50:40 +02:00
|
|
|
}
|
|
|
|
|
|
2024-03-26 00:06:25 +01:00
|
|
|
// Calls given function 'f' for each vertex in the graph. It is safe to manipulate any vertices
|
|
|
|
|
// in the graph, or to delete/unlink the vertex passed to 'f' during iteration. It is however
|
|
|
|
|
// not safe to delete/unlink any vertex in the same graph other than the one passed to 'f'.
|
2025-09-02 17:50:40 +02:00
|
|
|
void forEachVertex(std::function<void(DfgVertex&)> f) {
|
|
|
|
|
for (DfgVertexVar* const vtxp : m_varVertices.unlinkable()) f(*vtxp);
|
|
|
|
|
for (DfgConst* const vtxp : m_constVertices.unlinkable()) f(*vtxp);
|
|
|
|
|
for (DfgVertex* const vtxp : m_opVertices.unlinkable()) f(*vtxp);
|
|
|
|
|
}
|
2024-03-26 00:06:25 +01:00
|
|
|
|
|
|
|
|
// 'const' variant of 'forEachVertex'. No mutation allowed.
|
2025-09-02 17:50:40 +02:00
|
|
|
void forEachVertex(std::function<void(const DfgVertex&)> f) const {
|
|
|
|
|
for (const DfgVertexVar& vtx : m_varVertices) f(vtx);
|
|
|
|
|
for (const DfgConst& vtx : m_constVertices) f(vtx);
|
|
|
|
|
for (const DfgVertex& vtx : m_opVertices) f(vtx);
|
|
|
|
|
}
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-07-10 19:46:45 +02:00
|
|
|
// Return an identical, independent copy of this graph. Vertex and edge order might differ.
|
|
|
|
|
std::unique_ptr<DfgGraph> clone() const VL_MT_DISABLED;
|
|
|
|
|
|
2025-08-05 13:11:02 +02:00
|
|
|
// Merge contents of other graphs into this graph. Deletes the other graphs.
|
|
|
|
|
// DfgVertexVar instances representing the same Ast variable are unified.
|
|
|
|
|
void mergeGraphs(std::vector<std::unique_ptr<DfgGraph>>&& otherps) VL_MT_DISABLED;
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-06-16 13:25:44 +02:00
|
|
|
// Genarete a unique name. The provided 'prefix' and 'n' values will be part of the name, and
|
|
|
|
|
// must be unique (as a pair) in each invocation for this graph.
|
|
|
|
|
std::string makeUniqueName(const std::string& prefix, size_t n) VL_MT_DISABLED;
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
// Create a new variable with the given name and data type. For a Scoped
|
|
|
|
|
// Dfg, the AstScope where the corresponding AstVarScope will be inserted
|
|
|
|
|
// must be provided
|
2025-09-07 21:38:50 +02:00
|
|
|
DfgVertexVar* makeNewVar(FileLine*, const std::string& name, const DfgDataType&,
|
2025-07-01 23:55:08 +02:00
|
|
|
AstScope*) VL_MT_DISABLED;
|
2025-06-16 13:25:44 +02:00
|
|
|
|
2024-03-26 00:06:25 +01:00
|
|
|
// Split this graph into individual components (unique sub-graphs with no edges between them).
|
|
|
|
|
// Also removes any vertices that are not weakly connected to any variable.
|
|
|
|
|
// Leaves 'this' graph empty.
|
2025-08-20 19:21:24 +02:00
|
|
|
std::vector<std::unique_ptr<DfgGraph>>
|
|
|
|
|
splitIntoComponents(const std::string& label) VL_MT_DISABLED;
|
2024-03-26 00:06:25 +01:00
|
|
|
|
|
|
|
|
// Extract cyclic sub-graphs from 'this' graph. Cyclic sub-graphs are those that contain at
|
|
|
|
|
// least one strongly connected component (SCC) plus any other vertices that feed or sink from
|
|
|
|
|
// the SCCs, up to a variable boundary. This means that the returned graphs are guaranteed to
|
|
|
|
|
// be cyclic, but they are not guaranteed to be strongly connected (however, they are always
|
|
|
|
|
// at least weakly connected). Trivial SCCs that are acyclic (i.e.: vertices that are not part
|
|
|
|
|
// of a cycle) are left in 'this' graph. This means that at the end 'this' graph is guaranteed
|
|
|
|
|
// to be a DAG (acyclic). 'this' will not necessarily be a connected graph at the end, even if
|
|
|
|
|
// it was originally connected.
|
|
|
|
|
std::vector<std::unique_ptr<DfgGraph>>
|
2025-08-20 19:21:24 +02:00
|
|
|
extractCyclicComponents(const std::string& label) VL_MT_DISABLED;
|
2024-03-26 00:06:25 +01:00
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
//-----------------------------------------------------------------------
|
|
|
|
|
// Debug dumping
|
|
|
|
|
|
2024-03-26 00:06:25 +01:00
|
|
|
// Dump graph in Graphviz format into the given stream 'os'. 'label' is added to the name of
|
|
|
|
|
// the graph which is included in the output.
|
2025-08-10 10:19:16 +02:00
|
|
|
// If the predicate function 'p' is provided, only those vertices are dumped that satifty it.
|
|
|
|
|
void dumpDot(std::ostream& os, const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
|
2024-07-14 15:34:54 +02:00
|
|
|
// Dump graph in Graphviz format into a new file with the given 'filename'. 'label' is added to
|
2024-03-26 00:06:25 +01:00
|
|
|
// the name of the graph which is included in the output.
|
2025-08-10 10:19:16 +02:00
|
|
|
// If the predicate function 'p' is provided, only those vertices are dumped that satifty it.
|
|
|
|
|
void dumpDotFile(const std::string& filename, const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
|
|
|
|
|
// Same as dumpDotFile, but returns the contents as a string.
|
|
|
|
|
std::string dumpDotString(const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
|
2024-03-26 00:06:25 +01:00
|
|
|
// Dump graph in Graphviz format into a new automatically numbered debug file. 'label' is
|
|
|
|
|
// added to the name of the graph, which is included in the file name and the output.
|
2025-08-10 10:19:16 +02:00
|
|
|
// If the predicate function 'p' is provided, only those vertices are dumped that satifty it.
|
|
|
|
|
void dumpDotFilePrefixed(const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
|
|
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Returns the set of vertices in the upstream cones of the given vertices
|
|
|
|
|
std::unique_ptr<std::unordered_set<const DfgVertex*>>
|
2025-08-20 19:21:24 +02:00
|
|
|
sourceCone(const std::vector<const DfgVertex*>&) const VL_MT_DISABLED;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Returns the set of vertices in the downstream cones of the given vertices
|
|
|
|
|
std::unique_ptr<std::unordered_set<const DfgVertex*>>
|
2025-08-20 19:21:24 +02:00
|
|
|
sinkCone(const std::vector<const DfgVertex*>&) const VL_MT_DISABLED;
|
2025-09-02 17:50:40 +02:00
|
|
|
};
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
2026-03-26 11:04:31 +01:00
|
|
|
namespace V3Dfg {
|
|
|
|
|
//-----------------------------------------------------------------------
|
|
|
|
|
// Functions for compatibility tests
|
|
|
|
|
|
|
|
|
|
// Returns true if variable can be represented in the graph
|
|
|
|
|
inline bool isSupported(const AstVar* varp) {
|
|
|
|
|
if (varp->isIfaceRef()) return false; // Cannot handle interface references
|
|
|
|
|
if (varp->delayp()) return false; // Cannot handle delayed variables
|
|
|
|
|
if (varp->isSc()) return false; // SystemC variables are special and rare, we can ignore
|
|
|
|
|
if (varp->dfgMultidriven()) return false; // Discovered as multidriven on earlier DFG run
|
|
|
|
|
if (DfgVertexVar::hasRWRefs(varp)) return false; // Referenced via READWRITE references
|
|
|
|
|
return DfgDataType::fromAst(varp->dtypep());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Returns true if variable can be represented in the graph
|
|
|
|
|
inline bool isSupported(const AstVarScope* vscp) {
|
|
|
|
|
const AstNodeModule* const modp = vscp->scopep()->modp();
|
|
|
|
|
if (VN_IS(modp, Module)) {
|
|
|
|
|
// Regular module supported
|
|
|
|
|
} else if (const AstIface* const ifacep = VN_CAST(modp, Iface)) {
|
|
|
|
|
// Interfaces supported if there are no virtual interfaces for
|
|
|
|
|
// them, otherwise they cannot be resovled statically.
|
|
|
|
|
if (ifacep->hasVirtualRef()) return false;
|
|
|
|
|
} else {
|
|
|
|
|
return false; // Anything else (package, class, etc) not supported
|
|
|
|
|
}
|
|
|
|
|
if (DfgVertexVar::hasRWRefs(vscp)) return false; // Referenced via READWRITE references
|
|
|
|
|
// Check the AstVar
|
|
|
|
|
return isSupported(vscp->varp());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
} //namespace V3Dfg
|
|
|
|
|
|
2025-09-02 23:21:24 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Map from DfgVertices to T_Value implemeneted via DfgVertex::m_userStorage
|
|
|
|
|
|
|
|
|
|
// Base class with common behavour
|
|
|
|
|
class DfgUserMapBase VL_NOT_FINAL {
|
|
|
|
|
template <typename, bool>
|
|
|
|
|
friend class DfgUserMap;
|
|
|
|
|
|
|
|
|
|
protected:
|
|
|
|
|
// STATE
|
|
|
|
|
const DfgGraph* m_dfgp; // The graph this map is for
|
|
|
|
|
// The current generation number
|
|
|
|
|
const uint32_t m_currentGeneration;
|
|
|
|
|
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
explicit DfgUserMapBase(const DfgGraph* dfgp)
|
|
|
|
|
: m_dfgp{dfgp}
|
|
|
|
|
, m_currentGeneration{++m_dfgp->m_vertexUserGeneration} {
|
2025-12-29 03:30:16 +01:00
|
|
|
UASSERT(m_currentGeneration, "DfgGraph user data generation number overflow");
|
2025-09-02 23:21:24 +02:00
|
|
|
UASSERT(!m_dfgp->m_vertexUserInUse, "DfgUserMap already in use for this DfgGraph");
|
|
|
|
|
m_dfgp->m_vertexUserInUse = true;
|
|
|
|
|
}
|
|
|
|
|
VL_UNCOPYABLE(DfgUserMapBase);
|
|
|
|
|
DfgUserMapBase(DfgUserMapBase&& that)
|
|
|
|
|
: m_dfgp{that.m_dfgp}
|
|
|
|
|
, m_currentGeneration{that.m_currentGeneration} {
|
|
|
|
|
that.m_dfgp = nullptr;
|
|
|
|
|
}
|
|
|
|
|
DfgUserMapBase& operator=(DfgUserMapBase&&) = delete;
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
~DfgUserMapBase() {
|
|
|
|
|
if (m_dfgp) m_dfgp->m_vertexUserInUse = false;
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Specialization where T_Value fits in DfgVertex::m_userStorage directly
|
|
|
|
|
template <typename T_Value>
|
|
|
|
|
class DfgUserMap<T_Value, true> final : public DfgUserMapBase {
|
|
|
|
|
static_assert(fitsSpaceAllocatedFor<T_Value, decltype(DfgVertex::m_userStorage)>(),
|
|
|
|
|
"'T_Value' does not fit 'DfgVertex::m_userStorage'");
|
|
|
|
|
friend class DfgGraph;
|
|
|
|
|
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
explicit DfgUserMap(const DfgGraph* dfgp)
|
|
|
|
|
: DfgUserMapBase{dfgp} {}
|
|
|
|
|
VL_UNCOPYABLE(DfgUserMap);
|
|
|
|
|
DfgUserMap& operator=(DfgUserMap&&) = delete;
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
DfgUserMap(DfgUserMap&&) = default;
|
|
|
|
|
~DfgUserMap() = default;
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
// Retrieve mapped value for 'vtx', value initializing it on first access
|
|
|
|
|
T_Value& operator[](const DfgVertex& vtx) {
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
|
|
|
|
|
#endif
|
|
|
|
|
T_Value* const storagep = reinterpret_cast<T_Value*>(&vtx.m_userStorage);
|
|
|
|
|
if (vtx.m_userGeneration != m_currentGeneration) {
|
|
|
|
|
new (storagep) T_Value{};
|
|
|
|
|
vtx.m_userGeneration = m_currentGeneration;
|
|
|
|
|
}
|
|
|
|
|
return *storagep;
|
|
|
|
|
}
|
|
|
|
|
// Same as above with pointer as key
|
|
|
|
|
T_Value& operator[](const DfgVertex* vtxp) { return (*this)[*vtxp]; }
|
|
|
|
|
|
|
|
|
|
// Retrieve mapped value of 'vtx', must be alerady present
|
|
|
|
|
T_Value& at(const DfgVertex& vtx) const {
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
|
|
|
|
|
#endif
|
|
|
|
|
UASSERT_OBJ(vtx.m_userGeneration == m_currentGeneration, &vtx, "Vertex not in map");
|
|
|
|
|
T_Value* const storagep = reinterpret_cast<T_Value*>(&vtx.m_userStorage);
|
|
|
|
|
return *storagep;
|
|
|
|
|
}
|
|
|
|
|
// Same as above with pointer as key
|
|
|
|
|
T_Value& at(const DfgVertex* vtxp) const { return (*this).at(*vtxp); }
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Specialization where T_Value does not fit in DfgVertex::m_userStorage directly
|
|
|
|
|
template <typename T_Value>
|
|
|
|
|
class DfgUserMap<T_Value, false> final : public DfgUserMapBase {
|
|
|
|
|
static_assert(fitsSpaceAllocatedFor<T_Value*, decltype(DfgVertex::m_userStorage)>(),
|
|
|
|
|
"'T_Value*' does not fit 'DfgVertex::m_userStorage'");
|
|
|
|
|
friend class DfgGraph;
|
|
|
|
|
|
|
|
|
|
// STATE
|
|
|
|
|
std::deque<T_Value> m_storage; // Storage for T_Value instances
|
|
|
|
|
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
explicit DfgUserMap(const DfgGraph* dfgp)
|
|
|
|
|
: DfgUserMapBase{dfgp} {}
|
|
|
|
|
VL_UNCOPYABLE(DfgUserMap);
|
|
|
|
|
DfgUserMap& operator=(DfgUserMap&&) = delete;
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
DfgUserMap(DfgUserMap&&) = default;
|
|
|
|
|
~DfgUserMap() = default;
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
// Retrieve mapped value for 'vtx', value initializing it on first access
|
|
|
|
|
T_Value& operator[](const DfgVertex& vtx) {
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
|
|
|
|
|
#endif
|
|
|
|
|
T_Value*& storagepr = reinterpret_cast<T_Value*&>(vtx.m_userStorage);
|
|
|
|
|
if (vtx.m_userGeneration != m_currentGeneration) {
|
|
|
|
|
m_storage.emplace_back();
|
|
|
|
|
storagepr = &m_storage.back();
|
|
|
|
|
vtx.m_userGeneration = m_currentGeneration;
|
|
|
|
|
}
|
|
|
|
|
return *storagepr;
|
|
|
|
|
}
|
|
|
|
|
// Same as above with pointer as key
|
|
|
|
|
T_Value& operator[](const DfgVertex* vtxp) { return (*this)[*vtxp]; }
|
|
|
|
|
|
|
|
|
|
// Retrieve mapped value of 'vtx', must be alerady present
|
|
|
|
|
T_Value& at(const DfgVertex& vtx) const {
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
|
|
|
|
|
#endif
|
|
|
|
|
UASSERT_OBJ(vtx.m_userGeneration == m_currentGeneration, &vtx, "Vertex not in map");
|
|
|
|
|
return *reinterpret_cast<T_Value*&>(vtx.m_userStorage);
|
|
|
|
|
}
|
|
|
|
|
// Same as above with pointer as key
|
|
|
|
|
T_Value& at(const DfgVertex* vtxp) const { return (*this).at(*vtxp); }
|
|
|
|
|
};
|
|
|
|
|
|
2025-09-03 14:55:33 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Worklist for processing DfgVertices, implemented via DfgUserMap
|
|
|
|
|
|
|
|
|
|
class DfgWorklist final {
|
|
|
|
|
// STATE
|
|
|
|
|
|
|
|
|
|
// The Graph being processed
|
|
|
|
|
DfgGraph& m_dfg;
|
|
|
|
|
// Map from vertex to next vertex in the work list
|
|
|
|
|
DfgUserMap<DfgVertex*> m_nextp = m_dfg.makeUserMap<DfgVertex*>();
|
|
|
|
|
// We want all 'nextp' pointers for vertices that are in the worklist to be
|
|
|
|
|
// non-zero (including that of the last element). This allows us to do two
|
|
|
|
|
// important things: detect if an element is in the list by checking for a
|
|
|
|
|
// non-zero 'nextp'', and easy prefetching without conditionals. The
|
|
|
|
|
// address of the worklist itself is a good sentinel as it is a valid
|
|
|
|
|
// memory address, and we can easily check for the end of the list.
|
|
|
|
|
DfgVertex* const m_sentinelp = reinterpret_cast<DfgVertex*>(this);
|
|
|
|
|
// Head of work list
|
|
|
|
|
DfgVertex* m_headp = m_sentinelp;
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
// CONSTRUCTOR
|
2025-09-05 09:14:48 +02:00
|
|
|
explicit DfgWorklist(DfgGraph& dfg)
|
2025-09-03 14:55:33 +02:00
|
|
|
: m_dfg{dfg} {}
|
|
|
|
|
VL_UNCOPYABLE(DfgWorklist);
|
|
|
|
|
VL_UNMOVABLE(DfgWorklist);
|
|
|
|
|
~DfgWorklist() = default;
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
|
|
|
|
|
// If 'vtx' is not in the worklist already, add it at the head of the list
|
|
|
|
|
// and return ture. If 'vtx' is already in the work list, then do nothing
|
|
|
|
|
// and return false.
|
|
|
|
|
bool push_front(DfgVertex& vtx) {
|
|
|
|
|
// Pick up reference to the next pointer
|
|
|
|
|
DfgVertex*& nextpr = m_nextp[vtx];
|
|
|
|
|
// If already in work list then nothing to do
|
|
|
|
|
if (nextpr) return false;
|
|
|
|
|
// Prepend to work list
|
|
|
|
|
nextpr = m_headp;
|
|
|
|
|
m_headp = &vtx;
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Returns ture iff 'vtx' is in the worklist
|
|
|
|
|
bool contains(const DfgVertex& vtx) { return m_nextp[vtx]; }
|
|
|
|
|
|
|
|
|
|
// Process the worklist by removing the first element, calling on it the
|
|
|
|
|
// given callable 'f', and repeat until the worklist is empty. The callable
|
|
|
|
|
// 'f' can add furthere vertices to the worklist.
|
|
|
|
|
template <typename T_Callable>
|
|
|
|
|
void foreach(T_Callable&& f) {
|
|
|
|
|
static_assert(vlstd::is_invocable_r<void, T_Callable, DfgVertex&>::value,
|
|
|
|
|
"T_Callable 'f' must have a signature compatible with 'void(DfgVertex&)'");
|
|
|
|
|
|
|
|
|
|
// Process the work list
|
|
|
|
|
while (m_headp != m_sentinelp) {
|
|
|
|
|
// Pick up the head
|
|
|
|
|
DfgVertex& vtx = *m_headp;
|
|
|
|
|
// Detach the head
|
|
|
|
|
m_headp = m_nextp.at(vtx);
|
|
|
|
|
// Prefetch next item
|
|
|
|
|
VL_PREFETCH_RW(m_headp);
|
|
|
|
|
// This item is now off the work list
|
|
|
|
|
m_nextp.at(vtx) = nullptr;
|
|
|
|
|
// Apply 'f'
|
|
|
|
|
f(vtx);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Inline method definitions
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// DfgEdge {{{
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
void DfgEdge::unlinkSrcp() {
|
|
|
|
|
if (!m_srcp) return;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
bool contained = false;
|
|
|
|
|
for (const DfgEdge& edge : m_srcp->m_sinks) {
|
|
|
|
|
if (&edge != this) continue;
|
|
|
|
|
contained = true;
|
|
|
|
|
break;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
UASSERT_OBJ(contained, m_srcp, "'m_srcp' does not have this as sink");
|
|
|
|
|
#endif
|
|
|
|
|
m_srcp->m_sinks.unlink(this);
|
|
|
|
|
m_srcp = nullptr;
|
|
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
void DfgEdge::relinkSrcp(DfgVertex* srcp) {
|
|
|
|
|
// Unlink current source, if any
|
|
|
|
|
unlinkSrcp();
|
|
|
|
|
m_srcp = srcp;
|
|
|
|
|
if (m_srcp) m_srcp->m_sinks.linkFront(this);
|
|
|
|
|
}
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// }}}
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-09-10 13:38:49 +02:00
|
|
|
// DfgVertex {{{
|
|
|
|
|
|
|
|
|
|
bool DfgVertex::isCheaperThanLoad() const {
|
|
|
|
|
// Array sels are just address computation
|
|
|
|
|
if (is<DfgArraySel>()) return true;
|
|
|
|
|
// Small constant select from variable
|
|
|
|
|
if (const DfgSel* const selp = cast<DfgSel>()) {
|
|
|
|
|
if (!selp->fromp()->is<DfgVarPacked>()) return false;
|
|
|
|
|
if (selp->fromp()->width() <= VL_QUADSIZE) return true;
|
|
|
|
|
const uint32_t lsb = selp->lsb();
|
|
|
|
|
const uint32_t msb = lsb + selp->width() - 1;
|
|
|
|
|
return VL_BITWORD_E(msb) == VL_BITWORD_E(lsb);
|
|
|
|
|
}
|
|
|
|
|
// Zero extend of a cheap vertex - Extend(_) was converted to Concat(0, _)
|
|
|
|
|
if (const DfgConcat* const catp = cast<DfgConcat>()) {
|
|
|
|
|
if (catp->width() > VL_QUADSIZE) return false;
|
|
|
|
|
const DfgConst* const lCatp = catp->lhsp()->cast<DfgConst>();
|
|
|
|
|
if (!lCatp) return false;
|
|
|
|
|
if (!lCatp->isZero()) return false;
|
|
|
|
|
return catp->rhsp()->isCheaperThanLoad();
|
|
|
|
|
}
|
2026-03-27 08:47:50 +01:00
|
|
|
// Reduction of a cheap vertex
|
|
|
|
|
if (const DfgRedOr* const redOrp = cast<DfgRedOr>()) {
|
|
|
|
|
return redOrp->srcp()->isCheaperThanLoad();
|
|
|
|
|
}
|
|
|
|
|
if (const DfgRedAnd* const redAndp = cast<DfgRedAnd>()) {
|
|
|
|
|
return redAndp->srcp()->isCheaperThanLoad();
|
|
|
|
|
}
|
|
|
|
|
if (const DfgRedXor* const redXorp = cast<DfgRedXor>()) {
|
|
|
|
|
return redXorp->srcp()->isCheaperThanLoad();
|
|
|
|
|
}
|
2025-09-10 13:38:49 +02:00
|
|
|
// Otherwise probably not
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// }}}
|
|
|
|
|
|
2025-09-02 23:21:24 +02:00
|
|
|
// DfgGraph {{{
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
template <typename T_User>
|
2025-09-02 23:21:24 +02:00
|
|
|
DfgUserMap<T_User> DfgGraph::makeUserMap() const {
|
|
|
|
|
return DfgUserMap<T_User>{this};
|
2022-10-07 17:13:01 +02:00
|
|
|
}
|
2024-03-26 00:06:25 +01:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
// }}}
|
2024-03-26 00:06:25 +01:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#endif
|