Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Data flow graph (DFG) representation of logic
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2025-01-01 14:30:25 +01:00
|
|
|
// Copyright 2003-2025 by Wilson Snyder. This program is free software; you
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
2023-10-18 12:37:46 +02:00
|
|
|
#include "V3PchAstNoMT.h" // VL_MT_DISABLED_CODE_UNIT
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
#include "V3EmitV.h"
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3File.h"
|
|
|
|
|
|
2022-09-28 15:42:18 +02:00
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//------------------------------------------------------------------------------
|
2025-09-02 17:50:40 +02:00
|
|
|
// V3Dfg
|
|
|
|
|
|
|
|
|
|
// predicate for supported data types
|
|
|
|
|
static bool dfgGraphIsSupportedDTypePacked(const AstNodeDType* dtypep) {
|
|
|
|
|
dtypep = dtypep->skipRefp();
|
|
|
|
|
if (const AstBasicDType* const typep = VN_CAST(dtypep, BasicDType)) {
|
|
|
|
|
return typep->keyword().isIntNumeric();
|
|
|
|
|
}
|
|
|
|
|
if (const AstPackArrayDType* const typep = VN_CAST(dtypep, PackArrayDType)) {
|
|
|
|
|
return dfgGraphIsSupportedDTypePacked(typep->subDTypep());
|
|
|
|
|
}
|
|
|
|
|
if (const AstNodeUOrStructDType* const typep = VN_CAST(dtypep, NodeUOrStructDType)) {
|
|
|
|
|
return typep->packed();
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool V3Dfg::isSupported(const AstNodeDType* dtypep) {
|
|
|
|
|
dtypep = dtypep->skipRefp();
|
|
|
|
|
// Support 1 dimensional unpacked arrays of packed types
|
|
|
|
|
if (const AstUnpackArrayDType* const typep = VN_CAST(dtypep, UnpackArrayDType)) {
|
|
|
|
|
return dfgGraphIsSupportedDTypePacked(typep->subDTypep());
|
|
|
|
|
}
|
|
|
|
|
// Support packed types
|
|
|
|
|
return dfgGraphIsSupportedDTypePacked(dtypep);
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//------------------------------------------------------------------------------
|
2025-09-02 17:50:40 +02:00
|
|
|
// DfgGraph
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
DfgGraph::DfgGraph(AstModule* modulep, const string& name)
|
|
|
|
|
: m_modulep{modulep}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
, m_name{name} {}
|
|
|
|
|
|
|
|
|
|
DfgGraph::~DfgGraph() {
|
2025-09-02 17:50:40 +02:00
|
|
|
forEachVertex([&](DfgVertex& vtx) { vtx.unlinkDelete(*this); });
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-10 19:46:45 +02:00
|
|
|
std::unique_ptr<DfgGraph> DfgGraph::clone() const {
|
|
|
|
|
const bool scoped = !modulep();
|
|
|
|
|
|
|
|
|
|
DfgGraph* const clonep = new DfgGraph{modulep(), name()};
|
|
|
|
|
|
|
|
|
|
// Map from original vertex to clone
|
|
|
|
|
std::unordered_map<const DfgVertex*, DfgVertex*> vtxp2clonep(size() * 2);
|
|
|
|
|
|
|
|
|
|
// Clone constVertices
|
|
|
|
|
for (const DfgConst& vtx : m_constVertices) {
|
|
|
|
|
DfgConst* const cp = new DfgConst{*clonep, vtx.fileline(), vtx.num()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
}
|
|
|
|
|
// Clone variable vertices
|
|
|
|
|
for (const DfgVertexVar& vtx : m_varVertices) {
|
|
|
|
|
const DfgVertexVar* const vp = vtx.as<DfgVertexVar>();
|
|
|
|
|
DfgVertexVar* cp = nullptr;
|
|
|
|
|
|
|
|
|
|
switch (vtx.type()) {
|
|
|
|
|
case VDfgType::atVarArray: {
|
|
|
|
|
if (scoped) {
|
|
|
|
|
cp = new DfgVarArray{*clonep, vp->varScopep()};
|
|
|
|
|
} else {
|
|
|
|
|
cp = new DfgVarArray{*clonep, vp->varp()};
|
|
|
|
|
}
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atVarPacked: {
|
|
|
|
|
if (scoped) {
|
|
|
|
|
cp = new DfgVarPacked{*clonep, vp->varScopep()};
|
|
|
|
|
} else {
|
|
|
|
|
cp = new DfgVarPacked{*clonep, vp->varp()};
|
|
|
|
|
}
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
default: {
|
|
|
|
|
vtx.v3fatalSrc("Unhandled variable vertex type: " + vtx.typeName());
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
|
|
|
|
if (AstNode* const tmpForp = vp->tmpForp()) cp->tmpForp(tmpForp);
|
2025-07-10 19:46:45 +02:00
|
|
|
}
|
|
|
|
|
// Clone operation vertices
|
|
|
|
|
for (const DfgVertex& vtx : m_opVertices) {
|
|
|
|
|
switch (vtx.type()) {
|
|
|
|
|
#include "V3Dfg__gen_clone_cases.h" // From ./astgen
|
|
|
|
|
case VDfgType::atSel: {
|
|
|
|
|
DfgSel* const cp = new DfgSel{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
cp->lsb(vtx.as<DfgSel>()->lsb());
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
case VDfgType::atUnitArray: {
|
|
|
|
|
DfgUnitArray* const cp = new DfgUnitArray{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
2025-07-10 19:46:45 +02:00
|
|
|
case VDfgType::atMux: {
|
|
|
|
|
DfgMux* const cp = new DfgMux{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
2025-07-14 23:09:34 +02:00
|
|
|
case VDfgType::atSpliceArray: {
|
|
|
|
|
DfgSpliceArray* const cp = new DfgSpliceArray{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atSplicePacked: {
|
|
|
|
|
DfgSplicePacked* const cp = new DfgSplicePacked{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
case VDfgType::atLogic: {
|
|
|
|
|
vtx.v3fatalSrc("DfgLogic cannot be cloned");
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atUnresolved: {
|
|
|
|
|
vtx.v3fatalSrc("DfgUnresolved cannot be cloned");
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
2025-07-10 19:46:45 +02:00
|
|
|
default: {
|
|
|
|
|
vtx.v3fatalSrc("Unhandled operation vertex type: " + vtx.typeName());
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
UASSERT(size() == clonep->size(), "Size of clone should be the same");
|
|
|
|
|
|
|
|
|
|
// Constants have no inputs
|
|
|
|
|
// Hook up inputs of cloned variables
|
|
|
|
|
for (const DfgVertexVar& vtx : m_varVertices) {
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgVertexVar* const cp = vtxp2clonep.at(&vtx)->as<DfgVertexVar>();
|
|
|
|
|
if (const DfgVertex* const srcp = vtx.srcp()) cp->srcp(vtxp2clonep.at(srcp));
|
|
|
|
|
if (const DfgVertex* const defp = vtx.defaultp()) cp->defaultp(vtxp2clonep.at(defp));
|
2025-07-10 19:46:45 +02:00
|
|
|
}
|
|
|
|
|
// Hook up inputs of cloned operation vertices
|
|
|
|
|
for (const DfgVertex& vtx : m_opVertices) {
|
2025-07-14 23:09:34 +02:00
|
|
|
if (vtx.is<DfgVertexVariadic>()) {
|
|
|
|
|
switch (vtx.type()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
case VDfgType::atSpliceArray:
|
2025-07-14 23:09:34 +02:00
|
|
|
case VDfgType::atSplicePacked: {
|
2025-09-02 17:50:40 +02:00
|
|
|
const DfgVertexSplice* const vp = vtx.as<DfgVertexSplice>();
|
|
|
|
|
DfgVertexSplice* const cp = vtxp2clonep.at(vp)->as<DfgVertexSplice>();
|
|
|
|
|
vp->foreachDriver([&](const DfgVertex& src, uint32_t lo, FileLine* flp) {
|
|
|
|
|
cp->addDriver(vtxp2clonep.at(&src), lo, flp);
|
|
|
|
|
return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
});
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
default: {
|
|
|
|
|
vtx.v3fatalSrc("Unhandled DfgVertexVariadic sub type: " + vtx.typeName());
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
DfgVertex* const cp = vtxp2clonep.at(&vtx);
|
2025-09-02 17:50:40 +02:00
|
|
|
for (size_t i = 0; i < vtx.nInputs(); ++i) {
|
|
|
|
|
cp->inputp(i, vtxp2clonep.at(vtx.inputp(i)));
|
2025-07-10 19:46:45 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return std::unique_ptr<DfgGraph>{clonep};
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-05 13:11:02 +02:00
|
|
|
void DfgGraph::mergeGraphs(std::vector<std::unique_ptr<DfgGraph>>&& otherps) {
|
|
|
|
|
if (otherps.empty()) return;
|
|
|
|
|
|
|
|
|
|
// NODE STATE
|
|
|
|
|
// AstVar/AstVarScope::user2p() -> corresponding DfgVertexVar* in 'this' graph
|
|
|
|
|
const VNUser2InUse user2InUse;
|
|
|
|
|
|
|
|
|
|
// Set up Ast Variable -> DfgVertexVar map for 'this' graph
|
|
|
|
|
for (DfgVertexVar& vtx : m_varVertices) vtx.nodep()->user2p(&vtx);
|
|
|
|
|
|
|
|
|
|
// Merge in each of the other graphs
|
|
|
|
|
for (const std::unique_ptr<DfgGraph>& otherp : otherps) {
|
|
|
|
|
// Process variables
|
|
|
|
|
for (DfgVertexVar* const vtxp : otherp->m_varVertices.unlinkable()) {
|
|
|
|
|
// Variabels that are present in 'this', make them use the DfgVertexVar in 'this'.
|
|
|
|
|
if (DfgVertexVar* const altp = vtxp->nodep()->user2u().to<DfgVertexVar*>()) {
|
|
|
|
|
DfgVertex* const srcp = vtxp->srcp();
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgVertex* const defaultp = vtxp->defaultp();
|
|
|
|
|
UASSERT_OBJ(!(srcp || defaultp) || (!altp->srcp() && !altp->defaultp()), vtxp,
|
|
|
|
|
"At most one alias should be driven");
|
2025-08-05 13:11:02 +02:00
|
|
|
vtxp->replaceWith(altp);
|
|
|
|
|
if (srcp) altp->srcp(srcp);
|
2025-09-02 17:50:40 +02:00
|
|
|
if (defaultp) altp->defaultp(defaultp);
|
2025-08-05 13:11:02 +02:00
|
|
|
VL_DO_DANGLING(vtxp->unlinkDelete(*otherp), vtxp);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
// Otherwise they will be moved
|
|
|
|
|
vtxp->nodep()->user2p(vtxp);
|
2025-09-02 23:21:24 +02:00
|
|
|
vtxp->m_userGeneration = 0;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
vtxp->m_dfgp = this;
|
|
|
|
|
#endif
|
2025-08-05 13:11:02 +02:00
|
|
|
}
|
|
|
|
|
m_varVertices.splice(m_varVertices.end(), otherp->m_varVertices);
|
|
|
|
|
// Process constants
|
|
|
|
|
for (DfgConst& vtx : otherp->m_constVertices) {
|
2025-09-02 23:21:24 +02:00
|
|
|
vtx.m_userGeneration = 0;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
vtx.m_dfgp = this;
|
|
|
|
|
#endif
|
2025-08-05 13:11:02 +02:00
|
|
|
}
|
|
|
|
|
m_constVertices.splice(m_constVertices.end(), otherp->m_constVertices);
|
|
|
|
|
// Process operations
|
|
|
|
|
for (DfgVertex& vtx : otherp->m_opVertices) {
|
2025-09-02 23:21:24 +02:00
|
|
|
vtx.m_userGeneration = 0;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
vtx.m_dfgp = this;
|
|
|
|
|
#endif
|
2025-08-05 13:11:02 +02:00
|
|
|
}
|
|
|
|
|
m_opVertices.splice(m_opVertices.end(), otherp->m_opVertices);
|
|
|
|
|
// Update graph sizes
|
|
|
|
|
m_size += otherp->m_size;
|
|
|
|
|
otherp->m_size = 0;
|
2024-03-26 00:06:25 +01:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-06-16 13:25:44 +02:00
|
|
|
std::string DfgGraph::makeUniqueName(const std::string& prefix, size_t n) {
|
|
|
|
|
// Construct the tmpNameStub if we have not done so yet
|
|
|
|
|
if (m_tmpNameStub.empty()) {
|
|
|
|
|
// Use the hash of the graph name (avoid long names and non-identifiers)
|
2025-08-20 19:21:24 +02:00
|
|
|
const std::string hash = V3Hash{m_name}.toString();
|
2025-06-16 13:25:44 +02:00
|
|
|
// We need to keep every variable globally unique, and graph hashed
|
|
|
|
|
// names might not be, so keep a static table to track multiplicity
|
|
|
|
|
static std::unordered_map<std::string, uint32_t> s_multiplicity;
|
2025-08-20 19:21:24 +02:00
|
|
|
m_tmpNameStub += '_' + hash + '_' + std::to_string(s_multiplicity[hash]++) + '_';
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
|
|
|
|
// Assemble the globally unique name
|
|
|
|
|
return "__Vdfg" + prefix + m_tmpNameStub + std::to_string(n);
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
DfgVertexVar* DfgGraph::makeNewVar(FileLine* flp, const std::string& name, AstNodeDType* dtypep,
|
|
|
|
|
AstScope* scopep) {
|
|
|
|
|
UASSERT_OBJ(!!scopep != !!modulep(), flp,
|
|
|
|
|
"makeNewVar scopep should only be provided for a scoped DfgGraph");
|
|
|
|
|
|
|
|
|
|
// Create AstVar
|
2025-06-16 13:25:44 +02:00
|
|
|
AstVar* const varp = new AstVar{flp, VVarType::MODULETEMP, name, dtypep};
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
if (scopep) {
|
|
|
|
|
// Add AstVar to the scope's module
|
|
|
|
|
scopep->modp()->addStmtsp(varp);
|
|
|
|
|
// Create AstVarScope
|
|
|
|
|
AstVarScope* const vscp = new AstVarScope{flp, scopep, varp};
|
|
|
|
|
// Add to scope
|
|
|
|
|
scopep->addVarsp(vscp);
|
|
|
|
|
// Create and return the corresponding variable vertex
|
|
|
|
|
if (VN_IS(varp->dtypeSkipRefp(), UnpackArrayDType)) return new DfgVarArray{*this, vscp};
|
|
|
|
|
return new DfgVarPacked{*this, vscp};
|
|
|
|
|
} else {
|
|
|
|
|
// Add AstVar to containing module
|
|
|
|
|
modulep()->addStmtsp(varp);
|
|
|
|
|
// Create and return the corresponding variable vertex
|
|
|
|
|
if (VN_IS(varp->dtypeSkipRefp(), UnpackArrayDType)) return new DfgVarArray{*this, varp};
|
|
|
|
|
return new DfgVarPacked{*this, varp};
|
|
|
|
|
}
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-10 10:19:16 +02:00
|
|
|
static const std::string toDotId(const DfgVertex& vtx) { return '"' + cvtToHex(&vtx) + '"'; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Dump one DfgVertex in Graphviz format
|
|
|
|
|
static void dumpDotVertex(std::ostream& os, const DfgVertex& vtx) {
|
2025-09-02 17:50:40 +02:00
|
|
|
if (const DfgVertexVar* const varVtxp = vtx.cast<DfgVertexVar>()) {
|
2025-08-20 19:21:24 +02:00
|
|
|
const AstNode* const nodep = varVtxp->nodep();
|
|
|
|
|
const AstVar* const varp = varVtxp->varp();
|
2022-09-30 17:19:21 +02:00
|
|
|
os << toDotId(vtx);
|
2025-09-02 17:50:40 +02:00
|
|
|
// Begin attributes
|
|
|
|
|
os << " [";
|
|
|
|
|
// Begin 'label'
|
|
|
|
|
os << "label=\"";
|
|
|
|
|
// Name
|
|
|
|
|
os << nodep->prettyName();
|
|
|
|
|
// Address
|
|
|
|
|
os << '\n' << cvtToHex(varVtxp);
|
|
|
|
|
// Original variable, if any
|
2025-08-20 19:21:24 +02:00
|
|
|
if (const AstNode* const tmpForp = varVtxp->tmpForp()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
if (tmpForp != nodep) os << "\ntemporary for: " << tmpForp->prettyName();
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
// Type and fanout
|
|
|
|
|
os << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
varVtxp->dtypep()->dumpSmall(os);
|
2025-09-02 17:50:40 +02:00
|
|
|
os << " / F" << varVtxp->fanout();
|
|
|
|
|
// End 'label'
|
|
|
|
|
os << '"';
|
|
|
|
|
// Shape
|
|
|
|
|
if (varVtxp->is<DfgVarPacked>()) {
|
|
|
|
|
os << ", shape=box";
|
|
|
|
|
} else if (varVtxp->is<DfgVarArray>()) {
|
|
|
|
|
os << ", shape=box3d";
|
|
|
|
|
} else {
|
|
|
|
|
varVtxp->v3fatalSrc("Unhandled DfgVertexVar sub-type"); // LCOV_EXCL_LINE
|
|
|
|
|
}
|
|
|
|
|
// Color
|
2022-09-27 01:06:50 +02:00
|
|
|
if (varp->direction() == VDirection::INPUT) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=chartreuse2"; // Green
|
2022-09-27 01:06:50 +02:00
|
|
|
} else if (varp->direction() == VDirection::OUTPUT) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=cyan2"; // Cyan
|
2022-09-27 01:06:50 +02:00
|
|
|
} else if (varp->direction() == VDirection::INOUT) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=darkorchid2"; // Purple
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (varVtxp->hasExtRefs()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=firebrick2"; // Red
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (varVtxp->hasModRefs()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=darkorange1"; // Orange
|
2024-03-02 20:49:29 +01:00
|
|
|
} else if (varVtxp->hasDfgRefs()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=gold2"; // Yellow
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
} else if (varVtxp->tmpForp()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
os << ", style=filled, fillcolor=gray95";
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
// End attributes
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-09-27 01:06:50 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (const DfgConst* const constVtxp = vtx.cast<DfgConst>()) {
|
2022-10-07 16:44:14 +02:00
|
|
|
const V3Number& num = constVtxp->num();
|
2022-09-30 17:19:21 +02:00
|
|
|
|
|
|
|
|
os << toDotId(vtx);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
os << " [label=\"";
|
|
|
|
|
if (num.width() <= 32 && !num.isSigned()) {
|
2025-08-05 12:12:01 +02:00
|
|
|
os << constVtxp->width() << "'d" << num.toUInt() << '\n';
|
|
|
|
|
os << constVtxp->width() << "'h" << std::hex << num.toUInt() << std::dec << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
2025-08-05 12:12:01 +02:00
|
|
|
os << num.ascii() << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2025-08-05 12:12:01 +02:00
|
|
|
os << cvtToHex(constVtxp) << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
os << '"';
|
|
|
|
|
os << ", shape=plain";
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-09-27 01:06:50 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:21 +02:00
|
|
|
if (const DfgSel* const selVtxp = vtx.cast<DfgSel>()) {
|
2022-10-06 19:34:18 +02:00
|
|
|
const uint32_t lsb = selVtxp->lsb();
|
|
|
|
|
const uint32_t msb = lsb + selVtxp->width() - 1;
|
|
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"SEL _[" << msb << ":" << lsb << "]\n";
|
|
|
|
|
os << cvtToHex(selVtxp) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
vtx.dtypep()->dumpSmall(os);
|
|
|
|
|
os << " / F" << vtx.fanout() << '"';
|
2022-10-06 19:34:18 +02:00
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doublecircle";
|
|
|
|
|
} else {
|
|
|
|
|
os << ", shape=circle";
|
2022-09-30 17:19:21 +02:00
|
|
|
}
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
2022-09-30 17:19:21 +02:00
|
|
|
}
|
|
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
if (vtx.is<DfgVertexSplice>() || vtx.is<DfgUnitArray>() || vtx.is<DfgUnresolved>()) {
|
2025-07-14 23:09:34 +02:00
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"" << vtx.typeName() << '\n';
|
|
|
|
|
os << cvtToHex(&vtx) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
vtx.dtypep()->dumpSmall(os);
|
2025-07-14 23:09:34 +02:00
|
|
|
os << " / F" << vtx.fanout() << '"';
|
|
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doubleoctagon";
|
|
|
|
|
} else {
|
|
|
|
|
os << ", shape=octagon";
|
|
|
|
|
}
|
|
|
|
|
os << "]\n";
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
if (const DfgLogic* const logicp = vtx.cast<DfgLogic>()) {
|
|
|
|
|
os << toDotId(vtx);
|
|
|
|
|
std::stringstream ss;
|
|
|
|
|
V3EmitV::debugVerilogForTree(logicp->nodep(), ss);
|
2025-09-02 17:50:40 +02:00
|
|
|
std::string str = ss.str();
|
|
|
|
|
str = VString::quoteBackslash(str);
|
|
|
|
|
str = VString::quoteAny(str, '"', '\\');
|
|
|
|
|
str = VString::replaceSubstr(str, "\n", "\\l");
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
os << " [label=\"";
|
2025-09-02 17:50:40 +02:00
|
|
|
os << str;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
os << "\\n" << cvtToHex(&vtx);
|
|
|
|
|
os << "\"\n";
|
|
|
|
|
os << ", shape=box, style=\"rounded,filled\", fillcolor=cornsilk, nojustify=true";
|
|
|
|
|
os << "]\n";
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:21 +02:00
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"" << vtx.typeName() << '\n';
|
|
|
|
|
os << cvtToHex(&vtx) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
vtx.dtypep()->dumpSmall(os);
|
|
|
|
|
os << " / F" << vtx.fanout() << '"';
|
2022-09-27 01:06:50 +02:00
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doublecircle";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=circle";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-10 10:19:16 +02:00
|
|
|
void DfgGraph::dumpDot(std::ostream& os, const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p) const {
|
|
|
|
|
// This generates a graphviz dump, https://www.graphviz.org
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Header
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "digraph dfg {\n";
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
os << "rankdir=LR\n";
|
|
|
|
|
|
|
|
|
|
// If predicate not given, dump everything
|
|
|
|
|
if (!p) p = [](const DfgVertex&) { return true; };
|
|
|
|
|
|
|
|
|
|
std::unordered_set<const DfgVertex*> emitted;
|
|
|
|
|
// Emit all vertices associated with a DfgLogic
|
|
|
|
|
forEachVertex([&](const DfgVertex& vtx) {
|
|
|
|
|
const DfgLogic* const logicp = vtx.cast<DfgLogic>();
|
|
|
|
|
if (!logicp) return;
|
|
|
|
|
if (logicp->synth().empty()) return;
|
|
|
|
|
if (!p(vtx)) return;
|
|
|
|
|
os << "subgraph cluster_" << cvtToHex(logicp) << " {\n";
|
|
|
|
|
dumpDotVertex(os, *logicp);
|
|
|
|
|
emitted.insert(logicp);
|
|
|
|
|
for (DfgVertex* const vtxp : logicp->synth()) {
|
|
|
|
|
if (!p(*vtxp)) continue;
|
|
|
|
|
dumpDotVertex(os, *vtxp);
|
|
|
|
|
emitted.insert(vtxp);
|
|
|
|
|
}
|
|
|
|
|
os << "}\n";
|
|
|
|
|
});
|
|
|
|
|
// Emit all remaining vertices
|
|
|
|
|
forEachVertex([&](const DfgVertex& vtx) {
|
|
|
|
|
if (emitted.count(&vtx)) return;
|
|
|
|
|
if (!p(vtx)) return;
|
|
|
|
|
dumpDotVertex(os, vtx);
|
|
|
|
|
});
|
|
|
|
|
// Emit all edges
|
2025-09-02 17:50:40 +02:00
|
|
|
forEachVertex([&](const DfgVertex& vtx) {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
if (!p(vtx)) return;
|
2025-09-02 17:50:40 +02:00
|
|
|
for (size_t i = 0; i < vtx.nInputs(); ++i) {
|
|
|
|
|
DfgVertex* const srcp = vtx.inputp(i);
|
|
|
|
|
if (!srcp) continue;
|
|
|
|
|
if (!p(*srcp)) continue;
|
|
|
|
|
os << toDotId(*srcp) << " -> " << toDotId(vtx);
|
|
|
|
|
os << " [headlabel=\"" << vtx.srcName(i) << "\"]";
|
|
|
|
|
os << '\n';
|
|
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
});
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Footer
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
os << "label=\"" << name() + (label.empty() ? "" : "-" + label) << "\"\n";
|
|
|
|
|
os << "labelloc=t\n";
|
|
|
|
|
os << "labeljust=l\n";
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "}\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-10 10:19:16 +02:00
|
|
|
std::string DfgGraph::dumpDotString(const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p) const {
|
|
|
|
|
std::stringstream ss;
|
|
|
|
|
dumpDot(ss, label, p);
|
|
|
|
|
return ss.str();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDotFile(const std::string& filename, const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p) const {
|
2024-07-14 15:34:54 +02:00
|
|
|
const std::unique_ptr<std::ofstream> os{V3File::new_ofstream(filename)};
|
2025-03-24 00:51:54 +01:00
|
|
|
if (os->fail()) v3fatal("Can't write file: " << filename);
|
2025-08-10 10:19:16 +02:00
|
|
|
dumpDot(*os.get(), label, p);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
os->close();
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-10 10:19:16 +02:00
|
|
|
void DfgGraph::dumpDotFilePrefixed(const std::string& label,
|
|
|
|
|
std::function<bool(const DfgVertex&)> p) const {
|
|
|
|
|
std::string filename = name();
|
2024-07-14 15:34:54 +02:00
|
|
|
if (!label.empty()) filename += "-" + label;
|
2025-08-10 10:19:16 +02:00
|
|
|
dumpDotFile(v3Global.debugFilename(filename) + ".dot", label, p);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
template <bool T_SinksNotSources>
|
|
|
|
|
static std::unique_ptr<std::unordered_set<const DfgVertex*>>
|
2025-08-20 19:21:24 +02:00
|
|
|
dfgGraphCollectCone(const std::vector<const DfgVertex*>& vtxps) {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Work queue for traversal starting from all the seed vertices
|
|
|
|
|
std::vector<const DfgVertex*> queue = vtxps;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Set of already visited vertices
|
2025-08-20 19:21:24 +02:00
|
|
|
std::unique_ptr<std::unordered_set<const DfgVertex*>> resp{
|
|
|
|
|
new std::unordered_set<const DfgVertex*>{}};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Depth first traversal
|
|
|
|
|
while (!queue.empty()) {
|
|
|
|
|
// Pop next work item
|
2025-08-05 06:05:31 +02:00
|
|
|
const DfgVertex* const vtxp = queue.back();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
queue.pop_back();
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Mark vertex as visited, move on if already visited
|
|
|
|
|
if (!resp->insert(vtxp).second) continue;
|
|
|
|
|
// Enqueue all siblings of this vertex.
|
|
|
|
|
if VL_CONSTEXPR_CXX17 (T_SinksNotSources) {
|
2025-09-02 17:50:40 +02:00
|
|
|
vtxp->foreachSink([&](const DfgVertex& sink) {
|
|
|
|
|
queue.push_back(&sink);
|
|
|
|
|
return false;
|
|
|
|
|
});
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
} else {
|
2025-09-02 17:50:40 +02:00
|
|
|
vtxp->foreachSource([&](const DfgVertex& src) {
|
|
|
|
|
queue.push_back(&src);
|
|
|
|
|
return false;
|
|
|
|
|
});
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// Done
|
2025-08-20 19:21:24 +02:00
|
|
|
return resp;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
std::unique_ptr<std::unordered_set<const DfgVertex*>>
|
2025-08-20 19:21:24 +02:00
|
|
|
DfgGraph::sourceCone(const std::vector<const DfgVertex*>& vtxps) const {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
return dfgGraphCollectCone<false>(vtxps);
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
std::unique_ptr<std::unordered_set<const DfgVertex*>>
|
2025-08-20 19:21:24 +02:00
|
|
|
DfgGraph::sinkCone(const std::vector<const DfgVertex*>& vtxps) const {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
return dfgGraphCollectCone<true>(vtxps);
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgVertex
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgVertex::DfgVertex(DfgGraph& dfg, VDfgType type, FileLine* flp, AstNodeDType* dtypep)
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
: m_filelinep{flp}
|
|
|
|
|
, m_dtypep{dtypep}
|
|
|
|
|
, m_type{type} {
|
|
|
|
|
dfg.addVertex(*this);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool DfgVertex::equals(const DfgVertex& that, EqualsCache& cache) const {
|
2025-07-14 23:09:34 +02:00
|
|
|
// If same vertex, then equal
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (this == &that) return true;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// If different type, then not equal
|
2022-10-06 19:34:18 +02:00
|
|
|
if (this->type() != that.type()) return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// If different data type, then not equal
|
2022-10-06 19:34:18 +02:00
|
|
|
if (this->dtypep() != that.dtypep()) return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// If different number of inputs, then not equal
|
2025-09-02 17:50:40 +02:00
|
|
|
if (this->nInputs() != that.nInputs()) return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// Check vertex specifics
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (!this->selfEquals(that)) return false;
|
|
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
// Check sources
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
const auto key = (this < &that) ? EqualsCache::key_type{this, &that} //
|
|
|
|
|
: EqualsCache::key_type{&that, this};
|
2024-07-25 22:07:58 +02:00
|
|
|
// Note: the recursive invocation can cause a re-hash but that will not invalidate references
|
|
|
|
|
uint8_t& result = cache[key];
|
2022-10-06 12:26:11 +02:00
|
|
|
if (!result) {
|
2025-09-02 17:50:40 +02:00
|
|
|
const bool equal = [&]() {
|
|
|
|
|
for (size_t i = 0; i < nInputs(); ++i) {
|
|
|
|
|
const DfgVertex* const ap = this->inputp(i);
|
|
|
|
|
const DfgVertex* const bp = that.inputp(i);
|
|
|
|
|
if (!ap && !bp) continue;
|
|
|
|
|
if (!ap || !bp) return false;
|
|
|
|
|
if (!ap->equals(*bp, cache)) return false;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
return true;
|
|
|
|
|
}();
|
|
|
|
|
result = (static_cast<uint8_t>(equal) << 1) | 1;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-10-06 12:26:11 +02:00
|
|
|
return result >> 1;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-09-02 23:21:24 +02:00
|
|
|
V3Hash DfgVertex::hash(DfgUserMap<V3Hash>& cache) {
|
|
|
|
|
V3Hash& result = cache[this];
|
2022-10-06 12:26:11 +02:00
|
|
|
if (!result.value()) {
|
2022-10-21 11:50:02 +02:00
|
|
|
V3Hash hash{selfHash()};
|
|
|
|
|
// Variables are defined by themselves, so there is no need to hash them further
|
|
|
|
|
// (especially the sources). This enables sound hashing of graphs circular only through
|
|
|
|
|
// variables, which we rely on.
|
2022-10-04 12:03:41 +02:00
|
|
|
if (!is<DfgVertexVar>()) {
|
2022-10-21 11:50:02 +02:00
|
|
|
hash += m_type;
|
2025-09-02 17:50:40 +02:00
|
|
|
hash += size();
|
|
|
|
|
foreachSource([&](DfgVertex& vtx) {
|
2025-09-02 23:21:24 +02:00
|
|
|
hash += vtx.hash(cache);
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
|
|
|
|
});
|
2022-09-27 14:50:37 +02:00
|
|
|
}
|
2022-10-06 12:26:11 +02:00
|
|
|
result = hash;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uint32_t DfgVertex::fanout() const {
|
|
|
|
|
uint32_t result = 0;
|
2025-09-02 17:50:40 +02:00
|
|
|
foreachSink([&](const DfgVertex&) {
|
|
|
|
|
++result;
|
|
|
|
|
return false;
|
|
|
|
|
});
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
DfgVertexVar* DfgVertex::getResultVar() {
|
2025-06-16 13:25:44 +02:00
|
|
|
// It's easy if the vertex is already a variable ...
|
2025-07-14 23:09:34 +02:00
|
|
|
if (DfgVertexVar* const varp = this->cast<DfgVertexVar>()) return varp;
|
2025-06-16 13:25:44 +02:00
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
// Inspect existing variables written by this vertex, and choose one
|
|
|
|
|
DfgVertexVar* resp = nullptr;
|
2025-06-28 18:29:41 +02:00
|
|
|
// cppcheck-has-bug-suppress constParameter
|
2025-09-02 17:50:40 +02:00
|
|
|
this->foreachSink([&resp](DfgVertex& sink) {
|
2025-07-14 23:09:34 +02:00
|
|
|
DfgVertexVar* const varp = sink.cast<DfgVertexVar>();
|
2025-09-02 17:50:40 +02:00
|
|
|
if (!varp) return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
// First variable found
|
|
|
|
|
if (!resp) {
|
|
|
|
|
resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
2025-08-05 11:24:54 +02:00
|
|
|
|
2025-06-16 13:25:44 +02:00
|
|
|
// Prefer those variables that must be kept anyway
|
2025-08-05 11:24:54 +02:00
|
|
|
if (resp->hasExtRefs() != varp->hasExtRefs()) {
|
|
|
|
|
if (!resp->hasExtRefs()) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-08-05 11:24:54 +02:00
|
|
|
}
|
|
|
|
|
if (resp->hasModWrRefs() != varp->hasModWrRefs()) {
|
|
|
|
|
if (!resp->hasModWrRefs()) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-08-05 11:24:54 +02:00
|
|
|
}
|
|
|
|
|
if (resp->hasDfgRefs() != varp->hasDfgRefs()) {
|
|
|
|
|
if (!resp->hasDfgRefs()) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
|
|
|
|
// Prefer those that already have module references
|
2025-08-05 11:24:54 +02:00
|
|
|
if (resp->hasModRdRefs() != varp->hasModRdRefs()) {
|
|
|
|
|
if (!resp->hasModRdRefs()) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Prefer real variabels over temporaries
|
|
|
|
|
if (!resp->tmpForp() != !varp->tmpForp()) {
|
|
|
|
|
if (resp->tmpForp()) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
2025-06-16 13:25:44 +02:00
|
|
|
// Prefer the earlier one in source order
|
|
|
|
|
const FileLine& oldFlp = *(resp->fileline());
|
|
|
|
|
const FileLine& newFlp = *(varp->fileline());
|
|
|
|
|
if (const int cmp = oldFlp.operatorCompare(newFlp)) {
|
|
|
|
|
if (cmp > 0) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
|
|
|
|
// Prefer the one with the lexically smaller name
|
2025-07-01 23:55:08 +02:00
|
|
|
if (const int cmp = resp->nodep()->name().compare(varp->nodep()->name())) {
|
2025-06-16 13:25:44 +02:00
|
|
|
if (cmp > 0) resp = varp;
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
|
|
|
|
// 'resp' and 'varp' are all the same, keep using the existing 'resp'
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-06-16 13:25:44 +02:00
|
|
|
});
|
|
|
|
|
return resp;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
AstScope* DfgVertex::scopep(ScopeCache& cache, bool tryResultVar) VL_MT_DISABLED {
|
|
|
|
|
// If this is a variable, we are done
|
2025-08-20 19:21:24 +02:00
|
|
|
if (const DfgVertexVar* const varp = this->cast<DfgVertexVar>()) {
|
|
|
|
|
return varp->varScopep()->scopep();
|
|
|
|
|
}
|
2025-07-01 23:55:08 +02:00
|
|
|
|
|
|
|
|
// Try the result var first if instructed (usully only in the recursive case)
|
|
|
|
|
if (tryResultVar) {
|
2025-08-20 19:21:24 +02:00
|
|
|
if (const DfgVertexVar* const varp = this->getResultVar()) {
|
|
|
|
|
return varp->varScopep()->scopep();
|
|
|
|
|
}
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-24 16:31:31 +02:00
|
|
|
// Note: the recursive invocation can cause a re-hash but that will not invalidate references
|
|
|
|
|
AstScope*& resultr = cache[this];
|
|
|
|
|
if (!resultr) {
|
|
|
|
|
// Mark to prevent infinite recursion on circular graphs - should never be called on such
|
|
|
|
|
resultr = reinterpret_cast<AstScope*>(1);
|
2025-07-01 23:55:08 +02:00
|
|
|
// Find scope based on sources, falling back on the root scope
|
|
|
|
|
AstScope* const rootp = v3Global.rootp()->topScopep()->scopep();
|
2025-09-02 17:50:40 +02:00
|
|
|
AstScope* foundp = nullptr;
|
|
|
|
|
foreachSource([&](DfgVertex& src) {
|
|
|
|
|
AstScope* const scp = src.scopep(cache, true);
|
|
|
|
|
if (scp != rootp) {
|
|
|
|
|
foundp = scp;
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
});
|
|
|
|
|
resultr = foundp ? foundp : rootp;
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-24 16:31:31 +02:00
|
|
|
// Die on a graph circular through operation vertices
|
|
|
|
|
UASSERT_OBJ(resultr != reinterpret_cast<AstScope*>(1), this,
|
2025-07-01 23:55:08 +02:00
|
|
|
"DfgVertex::scopep called on graph with circular operations");
|
|
|
|
|
|
|
|
|
|
// Done
|
2025-07-24 16:31:31 +02:00
|
|
|
return resultr;
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void DfgVertex::unlinkDelete(DfgGraph& dfg) {
|
|
|
|
|
// Unlink sink edges
|
2025-09-02 17:50:40 +02:00
|
|
|
while (!m_sinks.empty()) m_sinks.frontp()->unlinkSrcp();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Remove from graph
|
|
|
|
|
dfg.removeVertex(*this);
|
2025-09-02 17:50:40 +02:00
|
|
|
// Delete - this will unlink sources
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
delete this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgVisitor
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
#include "V3Dfg__gen_visitor_defns.h" // From ./astgen
|