Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Data flow graph (DFG) representation of logic
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2025-01-01 14:30:25 +01:00
|
|
|
// Copyright 2003-2025 by Wilson Snyder. This program is free software; you
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
2023-10-18 12:37:46 +02:00
|
|
|
#include "V3PchAstNoMT.h" // VL_MT_DISABLED_CODE_UNIT
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
|
|
|
|
|
#include "V3File.h"
|
|
|
|
|
|
2022-09-28 15:42:18 +02:00
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgGraph
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
DfgGraph::DfgGraph(AstModule* modulep, const string& name)
|
|
|
|
|
: m_modulep{modulep}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
, m_name{name} {}
|
|
|
|
|
|
|
|
|
|
DfgGraph::~DfgGraph() {
|
|
|
|
|
forEachVertex([](DfgVertex& vtxp) { delete &vtxp; });
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-10 19:46:45 +02:00
|
|
|
std::unique_ptr<DfgGraph> DfgGraph::clone() const {
|
|
|
|
|
const bool scoped = !modulep();
|
|
|
|
|
|
|
|
|
|
DfgGraph* const clonep = new DfgGraph{modulep(), name()};
|
|
|
|
|
|
|
|
|
|
// Map from original vertex to clone
|
|
|
|
|
std::unordered_map<const DfgVertex*, DfgVertex*> vtxp2clonep(size() * 2);
|
|
|
|
|
|
|
|
|
|
// Clone constVertices
|
|
|
|
|
for (const DfgConst& vtx : m_constVertices) {
|
|
|
|
|
DfgConst* const cp = new DfgConst{*clonep, vtx.fileline(), vtx.num()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
}
|
|
|
|
|
// Clone variable vertices
|
|
|
|
|
for (const DfgVertexVar& vtx : m_varVertices) {
|
|
|
|
|
const DfgVertexVar* const vp = vtx.as<DfgVertexVar>();
|
|
|
|
|
DfgVertexVar* cp = nullptr;
|
|
|
|
|
|
|
|
|
|
switch (vtx.type()) {
|
|
|
|
|
case VDfgType::atVarArray: {
|
|
|
|
|
if (scoped) {
|
|
|
|
|
cp = new DfgVarArray{*clonep, vp->varScopep()};
|
|
|
|
|
} else {
|
|
|
|
|
cp = new DfgVarArray{*clonep, vp->varp()};
|
|
|
|
|
}
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atVarPacked: {
|
|
|
|
|
if (scoped) {
|
|
|
|
|
cp = new DfgVarPacked{*clonep, vp->varScopep()};
|
|
|
|
|
} else {
|
|
|
|
|
cp = new DfgVarPacked{*clonep, vp->varp()};
|
|
|
|
|
}
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
default: {
|
|
|
|
|
vtx.v3fatalSrc("Unhandled variable vertex type: " + vtx.typeName());
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// Clone operation vertices
|
|
|
|
|
for (const DfgVertex& vtx : m_opVertices) {
|
|
|
|
|
switch (vtx.type()) {
|
|
|
|
|
#include "V3Dfg__gen_clone_cases.h" // From ./astgen
|
|
|
|
|
case VDfgType::atSel: {
|
|
|
|
|
DfgSel* const cp = new DfgSel{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
cp->lsb(vtx.as<DfgSel>()->lsb());
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atMux: {
|
|
|
|
|
DfgMux* const cp = new DfgMux{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
2025-07-14 23:09:34 +02:00
|
|
|
case VDfgType::atSpliceArray: {
|
|
|
|
|
DfgSpliceArray* const cp = new DfgSpliceArray{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atSplicePacked: {
|
|
|
|
|
DfgSplicePacked* const cp = new DfgSplicePacked{*clonep, vtx.fileline(), vtx.dtypep()};
|
|
|
|
|
vtxp2clonep.emplace(&vtx, cp);
|
|
|
|
|
break;
|
|
|
|
|
}
|
2025-07-10 19:46:45 +02:00
|
|
|
default: {
|
|
|
|
|
vtx.v3fatalSrc("Unhandled operation vertex type: " + vtx.typeName());
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
UASSERT(size() == clonep->size(), "Size of clone should be the same");
|
|
|
|
|
|
|
|
|
|
// Constants have no inputs
|
|
|
|
|
// Hook up inputs of cloned variables
|
|
|
|
|
for (const DfgVertexVar& vtx : m_varVertices) {
|
2025-07-14 23:09:34 +02:00
|
|
|
// All variable vertices are unary
|
|
|
|
|
if (DfgVertex* const srcp = vtx.srcp()) {
|
|
|
|
|
vtxp2clonep.at(&vtx)->as<DfgVertexVar>()->srcp(vtxp2clonep.at(srcp));
|
2025-07-10 19:46:45 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// Hook up inputs of cloned operation vertices
|
|
|
|
|
for (const DfgVertex& vtx : m_opVertices) {
|
2025-07-14 23:09:34 +02:00
|
|
|
if (vtx.is<DfgVertexVariadic>()) {
|
|
|
|
|
switch (vtx.type()) {
|
|
|
|
|
case VDfgType::atSpliceArray: {
|
|
|
|
|
const DfgSpliceArray* const vp = vtx.as<DfgSpliceArray>();
|
|
|
|
|
DfgSpliceArray* const cp = vtxp2clonep.at(vp)->as<DfgSpliceArray>();
|
|
|
|
|
vp->forEachSourceEdge([&](const DfgEdge& edge, size_t i) {
|
|
|
|
|
if (DfgVertex* const srcp = edge.sourcep()) {
|
|
|
|
|
cp->addDriver(vp->driverFileLine(i), //
|
|
|
|
|
vp->driverIndex(i), //
|
|
|
|
|
vtxp2clonep.at(srcp));
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
case VDfgType::atSplicePacked: {
|
|
|
|
|
const DfgSplicePacked* const vp = vtx.as<DfgSplicePacked>();
|
|
|
|
|
DfgSplicePacked* const cp = vtxp2clonep.at(vp)->as<DfgSplicePacked>();
|
|
|
|
|
vp->forEachSourceEdge([&](const DfgEdge& edge, size_t i) {
|
|
|
|
|
if (DfgVertex* const srcp = edge.sourcep()) {
|
|
|
|
|
cp->addDriver(vp->driverFileLine(i), //
|
|
|
|
|
vp->driverLsb(i), //
|
|
|
|
|
vtxp2clonep.at(srcp));
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
default: {
|
|
|
|
|
vtx.v3fatalSrc("Unhandled DfgVertexVariadic sub type: " + vtx.typeName());
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
DfgVertex* const cp = vtxp2clonep.at(&vtx);
|
|
|
|
|
const auto oSourceEdges = vtx.sourceEdges();
|
|
|
|
|
auto cSourceEdges = cp->sourceEdges();
|
|
|
|
|
UASSERT_OBJ(oSourceEdges.second == cSourceEdges.second, &vtx,
|
|
|
|
|
"Mismatched source count");
|
|
|
|
|
for (size_t i = 0; i < oSourceEdges.second; ++i) {
|
|
|
|
|
if (DfgVertex* const srcp = oSourceEdges.first[i].sourcep()) {
|
|
|
|
|
cSourceEdges.first[i].relinkSource(vtxp2clonep.at(srcp));
|
|
|
|
|
}
|
2025-07-10 19:46:45 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return std::unique_ptr<DfgGraph>{clonep};
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-05 13:11:02 +02:00
|
|
|
void DfgGraph::mergeGraphs(std::vector<std::unique_ptr<DfgGraph>>&& otherps) {
|
|
|
|
|
if (otherps.empty()) return;
|
|
|
|
|
|
|
|
|
|
// NODE STATE
|
|
|
|
|
// AstVar/AstVarScope::user2p() -> corresponding DfgVertexVar* in 'this' graph
|
|
|
|
|
const VNUser2InUse user2InUse;
|
|
|
|
|
|
|
|
|
|
// Set up Ast Variable -> DfgVertexVar map for 'this' graph
|
|
|
|
|
for (DfgVertexVar& vtx : m_varVertices) vtx.nodep()->user2p(&vtx);
|
|
|
|
|
|
|
|
|
|
// Merge in each of the other graphs
|
|
|
|
|
for (const std::unique_ptr<DfgGraph>& otherp : otherps) {
|
|
|
|
|
// Process variables
|
|
|
|
|
for (DfgVertexVar* const vtxp : otherp->m_varVertices.unlinkable()) {
|
|
|
|
|
// Variabels that are present in 'this', make them use the DfgVertexVar in 'this'.
|
|
|
|
|
if (DfgVertexVar* const altp = vtxp->nodep()->user2u().to<DfgVertexVar*>()) {
|
|
|
|
|
DfgVertex* const srcp = vtxp->srcp();
|
|
|
|
|
UASSERT_OBJ(!srcp || !altp->srcp(), vtxp, "At most one alias should be driven");
|
|
|
|
|
vtxp->replaceWith(altp);
|
|
|
|
|
if (srcp) altp->srcp(srcp);
|
|
|
|
|
VL_DO_DANGLING(vtxp->unlinkDelete(*otherp), vtxp);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
// Otherwise they will be moved
|
|
|
|
|
vtxp->nodep()->user2p(vtxp);
|
|
|
|
|
vtxp->m_userCnt = 0;
|
|
|
|
|
vtxp->m_graphp = this;
|
|
|
|
|
}
|
|
|
|
|
m_varVertices.splice(m_varVertices.end(), otherp->m_varVertices);
|
|
|
|
|
// Process constants
|
|
|
|
|
for (DfgConst& vtx : otherp->m_constVertices) {
|
|
|
|
|
vtx.m_userCnt = 0;
|
|
|
|
|
vtx.m_graphp = this;
|
|
|
|
|
}
|
|
|
|
|
m_constVertices.splice(m_constVertices.end(), otherp->m_constVertices);
|
|
|
|
|
// Process operations
|
|
|
|
|
for (DfgVertex& vtx : otherp->m_opVertices) {
|
|
|
|
|
vtx.m_userCnt = 0;
|
|
|
|
|
vtx.m_graphp = this;
|
|
|
|
|
}
|
|
|
|
|
m_opVertices.splice(m_opVertices.end(), otherp->m_opVertices);
|
|
|
|
|
// Update graph sizes
|
|
|
|
|
m_size += otherp->m_size;
|
|
|
|
|
otherp->m_size = 0;
|
2024-03-26 00:06:25 +01:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-06-16 13:25:44 +02:00
|
|
|
std::string DfgGraph::makeUniqueName(const std::string& prefix, size_t n) {
|
|
|
|
|
// Construct the tmpNameStub if we have not done so yet
|
|
|
|
|
if (m_tmpNameStub.empty()) {
|
|
|
|
|
// Use the hash of the graph name (avoid long names and non-identifiers)
|
|
|
|
|
const std::string name = V3Hash{m_name}.toString();
|
|
|
|
|
// We need to keep every variable globally unique, and graph hashed
|
|
|
|
|
// names might not be, so keep a static table to track multiplicity
|
|
|
|
|
static std::unordered_map<std::string, uint32_t> s_multiplicity;
|
|
|
|
|
m_tmpNameStub += '_' + name + '_' + std::to_string(s_multiplicity[name]++) + '_';
|
|
|
|
|
}
|
|
|
|
|
// Assemble the globally unique name
|
|
|
|
|
return "__Vdfg" + prefix + m_tmpNameStub + std::to_string(n);
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
DfgVertexVar* DfgGraph::makeNewVar(FileLine* flp, const std::string& name, AstNodeDType* dtypep,
|
|
|
|
|
AstScope* scopep) {
|
|
|
|
|
UASSERT_OBJ(!!scopep != !!modulep(), flp,
|
|
|
|
|
"makeNewVar scopep should only be provided for a scoped DfgGraph");
|
|
|
|
|
|
|
|
|
|
// Create AstVar
|
2025-06-16 13:25:44 +02:00
|
|
|
AstVar* const varp = new AstVar{flp, VVarType::MODULETEMP, name, dtypep};
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
if (scopep) {
|
|
|
|
|
// Add AstVar to the scope's module
|
|
|
|
|
scopep->modp()->addStmtsp(varp);
|
|
|
|
|
// Create AstVarScope
|
|
|
|
|
AstVarScope* const vscp = new AstVarScope{flp, scopep, varp};
|
|
|
|
|
// Add to scope
|
|
|
|
|
scopep->addVarsp(vscp);
|
|
|
|
|
// Create and return the corresponding variable vertex
|
|
|
|
|
if (VN_IS(varp->dtypeSkipRefp(), UnpackArrayDType)) return new DfgVarArray{*this, vscp};
|
|
|
|
|
return new DfgVarPacked{*this, vscp};
|
|
|
|
|
} else {
|
|
|
|
|
// Add AstVar to containing module
|
|
|
|
|
modulep()->addStmtsp(varp);
|
|
|
|
|
// Create and return the corresponding variable vertex
|
|
|
|
|
if (VN_IS(varp->dtypeSkipRefp(), UnpackArrayDType)) return new DfgVarArray{*this, varp};
|
|
|
|
|
return new DfgVarPacked{*this, varp};
|
|
|
|
|
}
|
2025-06-16 13:25:44 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
static const string toDotId(const DfgVertex& vtx) { return '"' + cvtToHex(&vtx) + '"'; }
|
|
|
|
|
|
|
|
|
|
// Dump one DfgVertex in Graphviz format
|
|
|
|
|
static void dumpDotVertex(std::ostream& os, const DfgVertex& vtx) {
|
2022-09-27 01:06:50 +02:00
|
|
|
|
|
|
|
|
if (const DfgVarPacked* const varVtxp = vtx.cast<DfgVarPacked>()) {
|
2025-07-01 23:55:08 +02:00
|
|
|
AstNode* const nodep = varVtxp->nodep();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstVar* const varp = varVtxp->varp();
|
2022-09-30 17:19:21 +02:00
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"" << nodep->name() << '\n';
|
|
|
|
|
os << cvtToHex(varVtxp) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
varVtxp->dtypep()->dumpSmall(os);
|
|
|
|
|
os << " / F" << varVtxp->fanout() << '"';
|
2022-09-27 01:06:50 +02:00
|
|
|
|
|
|
|
|
if (varp->direction() == VDirection::INPUT) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=chartreuse2"; // Green
|
|
|
|
|
} else if (varp->direction() == VDirection::OUTPUT) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=cyan2"; // Cyan
|
|
|
|
|
} else if (varp->direction() == VDirection::INOUT) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=darkorchid2"; // Purple
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (varVtxp->hasExtRefs()) {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=box, style=filled, fillcolor=firebrick2"; // Red
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (varVtxp->hasModRefs()) {
|
2024-03-02 20:49:29 +01:00
|
|
|
os << ", shape=box, style=filled, fillcolor=darkorange1"; // Orange
|
|
|
|
|
} else if (varVtxp->hasDfgRefs()) {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=box, style=filled, fillcolor=gold2"; // Yellow
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
|
|
|
|
os << ", shape=box";
|
|
|
|
|
}
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-09-27 01:06:50 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (const DfgVarArray* const arrVtxp = vtx.cast<DfgVarArray>()) {
|
2025-07-01 23:55:08 +02:00
|
|
|
AstNode* const nodep = arrVtxp->nodep();
|
2022-09-27 01:06:50 +02:00
|
|
|
AstVar* const varp = arrVtxp->varp();
|
2022-09-30 17:19:21 +02:00
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"" << nodep->name() << '\n';
|
|
|
|
|
os << cvtToHex(arrVtxp) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
arrVtxp->dtypep()->dumpSmall(os);
|
|
|
|
|
os << " / F" << arrVtxp->fanout() << '"';
|
2022-09-27 01:06:50 +02:00
|
|
|
if (varp->direction() == VDirection::INPUT) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=chartreuse2"; // Green
|
|
|
|
|
} else if (varp->direction() == VDirection::OUTPUT) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=cyan2"; // Cyan
|
|
|
|
|
} else if (varp->direction() == VDirection::INOUT) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=darkorchid2"; // Purple
|
|
|
|
|
} else if (arrVtxp->hasExtRefs()) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=firebrick2"; // Red
|
|
|
|
|
} else if (arrVtxp->hasModRefs()) {
|
2024-03-02 20:49:29 +01:00
|
|
|
os << ", shape=box3d, style=filled, fillcolor=darkorange1"; // Orange
|
|
|
|
|
} else if (arrVtxp->hasDfgRefs()) {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=box3d, style=filled, fillcolor=gold2"; // Yellow
|
|
|
|
|
} else {
|
|
|
|
|
os << ", shape=box3d";
|
|
|
|
|
}
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-09-27 01:06:50 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (const DfgConst* const constVtxp = vtx.cast<DfgConst>()) {
|
2022-10-07 16:44:14 +02:00
|
|
|
const V3Number& num = constVtxp->num();
|
2022-09-30 17:19:21 +02:00
|
|
|
|
|
|
|
|
os << toDotId(vtx);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
os << " [label=\"";
|
|
|
|
|
if (num.width() <= 32 && !num.isSigned()) {
|
2025-08-05 12:12:01 +02:00
|
|
|
os << constVtxp->width() << "'d" << num.toUInt() << '\n';
|
|
|
|
|
os << constVtxp->width() << "'h" << std::hex << num.toUInt() << std::dec << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
2025-08-05 12:12:01 +02:00
|
|
|
os << num.ascii() << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2025-08-05 12:12:01 +02:00
|
|
|
os << cvtToHex(constVtxp) << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
os << '"';
|
|
|
|
|
os << ", shape=plain";
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-09-27 01:06:50 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:21 +02:00
|
|
|
if (const DfgSel* const selVtxp = vtx.cast<DfgSel>()) {
|
2022-10-06 19:34:18 +02:00
|
|
|
const uint32_t lsb = selVtxp->lsb();
|
|
|
|
|
const uint32_t msb = lsb + selVtxp->width() - 1;
|
|
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"SEL _[" << msb << ":" << lsb << "]\n";
|
|
|
|
|
os << cvtToHex(selVtxp) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
vtx.dtypep()->dumpSmall(os);
|
|
|
|
|
os << " / F" << vtx.fanout() << '"';
|
2022-10-06 19:34:18 +02:00
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doublecircle";
|
|
|
|
|
} else {
|
|
|
|
|
os << ", shape=circle";
|
2022-09-30 17:19:21 +02:00
|
|
|
}
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
2022-09-30 17:19:21 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
if (vtx.is<DfgVertexSplice>()) {
|
|
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"" << vtx.typeName() << '\n';
|
|
|
|
|
os << cvtToHex(&vtx) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
vtx.dtypep()->dumpSmall(os);
|
2025-07-14 23:09:34 +02:00
|
|
|
os << " / F" << vtx.fanout() << '"';
|
|
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doubleoctagon";
|
|
|
|
|
} else {
|
|
|
|
|
os << ", shape=octagon";
|
|
|
|
|
}
|
|
|
|
|
os << "]\n";
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:21 +02:00
|
|
|
os << toDotId(vtx);
|
2025-08-05 12:12:01 +02:00
|
|
|
os << " [label=\"" << vtx.typeName() << '\n';
|
|
|
|
|
os << cvtToHex(&vtx) << '\n';
|
2025-07-21 18:33:12 +02:00
|
|
|
vtx.dtypep()->dumpSmall(os);
|
|
|
|
|
os << " / F" << vtx.fanout() << '"';
|
2022-09-27 01:06:50 +02:00
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doublecircle";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=circle";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "]\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Dump one DfgEdge in Graphviz format
|
|
|
|
|
static void dumpDotEdge(std::ostream& os, const DfgEdge& edge, const string& headlabel) {
|
|
|
|
|
os << toDotId(*edge.sourcep()) << " -> " << toDotId(*edge.sinkp());
|
|
|
|
|
if (!headlabel.empty()) os << " [headlabel=\"" << headlabel << "\"]";
|
2025-08-05 12:12:01 +02:00
|
|
|
os << '\n';
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Dump one DfgVertex and all of its source DfgEdges in Graphviz format
|
|
|
|
|
static void dumpDotVertexAndSourceEdges(std::ostream& os, const DfgVertex& vtx) {
|
|
|
|
|
dumpDotVertex(os, vtx);
|
|
|
|
|
vtx.forEachSourceEdge([&](const DfgEdge& edge, size_t idx) { //
|
|
|
|
|
if (edge.sourcep()) {
|
|
|
|
|
string headLabel;
|
2025-07-14 23:09:34 +02:00
|
|
|
if (vtx.arity() > 1 || vtx.is<DfgVertexSplice>()) headLabel = vtx.srcName(idx);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
dumpDotEdge(os, edge, headLabel);
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDot(std::ostream& os, const string& label) const {
|
|
|
|
|
// Header
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "digraph dfg {\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
os << "graph [label=\"" << name();
|
|
|
|
|
if (!label.empty()) os << "-" << label;
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "\", labelloc=t, labeljust=l]\n";
|
|
|
|
|
os << "graph [rankdir=LR]\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Emit all vertices
|
|
|
|
|
forEachVertex([&](const DfgVertex& vtx) { dumpDotVertexAndSourceEdges(os, vtx); });
|
|
|
|
|
|
|
|
|
|
// Footer
|
2023-11-24 17:45:52 +01:00
|
|
|
os << "}\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2024-07-14 15:34:54 +02:00
|
|
|
void DfgGraph::dumpDotFile(const string& filename, const string& label) const {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// This generates a file used by graphviz, https://www.graphviz.org
|
|
|
|
|
// "hardcoded" parameters:
|
2024-07-14 15:34:54 +02:00
|
|
|
const std::unique_ptr<std::ofstream> os{V3File::new_ofstream(filename)};
|
2025-03-24 00:51:54 +01:00
|
|
|
if (os->fail()) v3fatal("Can't write file: " << filename);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
dumpDot(*os.get(), label);
|
|
|
|
|
os->close();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDotFilePrefixed(const string& label) const {
|
2024-07-14 15:34:54 +02:00
|
|
|
string filename = name();
|
|
|
|
|
if (!label.empty()) filename += "-" + label;
|
|
|
|
|
dumpDotFile(v3Global.debugFilename(filename) + ".dot", label);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-05 06:05:31 +02:00
|
|
|
// LCOV_EXCL_START // Debug function for developer use only
|
|
|
|
|
void DfgGraph::dumpDotUpstreamCone(const string& fileName, const DfgVertex& vtx,
|
|
|
|
|
const string& name) const {
|
|
|
|
|
// Open output file
|
|
|
|
|
const std::unique_ptr<std::ofstream> os{V3File::new_ofstream(fileName)};
|
|
|
|
|
if (os->fail()) v3fatal("Can't write file: " << fileName);
|
|
|
|
|
|
|
|
|
|
// Header
|
|
|
|
|
*os << "digraph dfg {\n";
|
|
|
|
|
if (!name.empty()) *os << "graph [label=\"" << name << "\", labelloc=t, labeljust=l]\n";
|
|
|
|
|
*os << "graph [rankdir=LR]\n";
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Work queue for depth first traversal starting from this vertex
|
|
|
|
|
std::vector<const DfgVertex*> queue{&vtx};
|
|
|
|
|
|
|
|
|
|
// Set of already visited vertices
|
|
|
|
|
std::unordered_set<const DfgVertex*> visited;
|
|
|
|
|
|
|
|
|
|
// Depth first traversal
|
|
|
|
|
while (!queue.empty()) {
|
|
|
|
|
// Pop next work item
|
2025-08-05 06:05:31 +02:00
|
|
|
const DfgVertex* const vtxp = queue.back();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
queue.pop_back();
|
|
|
|
|
|
|
|
|
|
// Mark vertex as visited
|
2025-08-05 06:05:31 +02:00
|
|
|
const bool isFirstEncounter = visited.insert(vtxp).second;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// If we have already visited this vertex during the traversal, then move on.
|
|
|
|
|
if (!isFirstEncounter) continue;
|
|
|
|
|
|
|
|
|
|
// Enqueue all sources of this vertex.
|
2025-08-05 06:05:31 +02:00
|
|
|
vtxp->forEachSource([&](const DfgVertex& src) { queue.push_back(&src); });
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Emit this vertex and all of its source edges
|
2025-08-05 06:05:31 +02:00
|
|
|
dumpDotVertexAndSourceEdges(*os, *vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Footer
|
2023-11-24 17:45:52 +01:00
|
|
|
*os << "}\n";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Done
|
|
|
|
|
os->close();
|
|
|
|
|
}
|
|
|
|
|
// LCOV_EXCL_STOP
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgEdge
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
void DfgEdge::unlinkSource() {
|
|
|
|
|
if (!m_sourcep) return;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
{
|
|
|
|
|
DfgEdge* sinkp = m_sourcep->m_sinksp;
|
|
|
|
|
while (sinkp) {
|
|
|
|
|
if (sinkp == this) break;
|
|
|
|
|
sinkp = sinkp->m_nextp;
|
|
|
|
|
}
|
|
|
|
|
UASSERT(sinkp, "'m_sourcep' does not have this edge as sink");
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
// Relink pointers of predecessor and successor
|
|
|
|
|
if (m_prevp) m_prevp->m_nextp = m_nextp;
|
|
|
|
|
if (m_nextp) m_nextp->m_prevp = m_prevp;
|
|
|
|
|
// If head of list in source, update source's head pointer
|
|
|
|
|
if (m_sourcep->m_sinksp == this) m_sourcep->m_sinksp = m_nextp;
|
|
|
|
|
// Mark source as unconnected
|
|
|
|
|
m_sourcep = nullptr;
|
|
|
|
|
// Clear links. This is not strictly necessary, but might catch bugs.
|
|
|
|
|
m_prevp = nullptr;
|
|
|
|
|
m_nextp = nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgEdge::relinkSource(DfgVertex* newSourcep) {
|
|
|
|
|
// Unlink current source, if any
|
|
|
|
|
unlinkSource();
|
|
|
|
|
// Link new source
|
|
|
|
|
m_sourcep = newSourcep;
|
|
|
|
|
// Prepend to sink list in source
|
|
|
|
|
m_nextp = newSourcep->m_sinksp;
|
|
|
|
|
if (m_nextp) m_nextp->m_prevp = this;
|
|
|
|
|
newSourcep->m_sinksp = this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgVertex
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgVertex::DfgVertex(DfgGraph& dfg, VDfgType type, FileLine* flp, AstNodeDType* dtypep)
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
: m_filelinep{flp}
|
|
|
|
|
, m_dtypep{dtypep}
|
|
|
|
|
, m_type{type} {
|
|
|
|
|
dfg.addVertex(*this);
|
|
|
|
|
}
|
|
|
|
|
|
2024-03-23 23:12:43 +01:00
|
|
|
DfgVertex::~DfgVertex() {}
|
2022-09-27 01:06:50 +02:00
|
|
|
|
2022-10-06 19:34:18 +02:00
|
|
|
bool DfgVertex::selfEquals(const DfgVertex& that) const { return true; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-06 19:34:18 +02:00
|
|
|
V3Hash DfgVertex::selfHash() const { return V3Hash{}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
bool DfgVertex::equals(const DfgVertex& that, EqualsCache& cache) const {
|
2025-07-14 23:09:34 +02:00
|
|
|
// If same vertex, then equal
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (this == &that) return true;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// If different type, then not equal
|
2022-10-06 19:34:18 +02:00
|
|
|
if (this->type() != that.type()) return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// If different data type, then not equal
|
2022-10-06 19:34:18 +02:00
|
|
|
if (this->dtypep() != that.dtypep()) return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
// If different number of inputs, then not equal
|
|
|
|
|
auto thisPair = this->sourceEdges();
|
|
|
|
|
const DfgEdge* const thisSrcEdgesp = thisPair.first;
|
|
|
|
|
const size_t thisArity = thisPair.second;
|
|
|
|
|
auto thatPair = that.sourceEdges();
|
|
|
|
|
const DfgEdge* const thatSrcEdgesp = thatPair.first;
|
|
|
|
|
const size_t thatArity = thatPair.second;
|
|
|
|
|
if (thisArity != thatArity) return false;
|
|
|
|
|
|
|
|
|
|
// Check vertex specifics
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (!this->selfEquals(that)) return false;
|
|
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
// Check sources
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
const auto key = (this < &that) ? EqualsCache::key_type{this, &that} //
|
|
|
|
|
: EqualsCache::key_type{&that, this};
|
2024-07-25 22:07:58 +02:00
|
|
|
// Note: the recursive invocation can cause a re-hash but that will not invalidate references
|
|
|
|
|
uint8_t& result = cache[key];
|
2022-10-06 12:26:11 +02:00
|
|
|
if (!result) {
|
|
|
|
|
result = 2; // Assume equals
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
for (size_t i = 0; i < thisArity; ++i) {
|
|
|
|
|
const DfgVertex* const thisSrcVtxp = thisSrcEdgesp[i].m_sourcep;
|
|
|
|
|
const DfgVertex* const thatSrcVtxp = thatSrcEdgesp[i].m_sourcep;
|
|
|
|
|
if (thisSrcVtxp == thatSrcVtxp) continue;
|
|
|
|
|
if (!thisSrcVtxp || !thatSrcVtxp || !thisSrcVtxp->equals(*thatSrcVtxp, cache)) {
|
2022-10-06 12:26:11 +02:00
|
|
|
result = 1; // Mark not equal
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-06 12:26:11 +02:00
|
|
|
return result >> 1;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-06 12:26:11 +02:00
|
|
|
V3Hash DfgVertex::hash() {
|
|
|
|
|
V3Hash& result = user<V3Hash>();
|
|
|
|
|
if (!result.value()) {
|
2022-10-21 11:50:02 +02:00
|
|
|
V3Hash hash{selfHash()};
|
|
|
|
|
// Variables are defined by themselves, so there is no need to hash them further
|
|
|
|
|
// (especially the sources). This enables sound hashing of graphs circular only through
|
|
|
|
|
// variables, which we rely on.
|
2022-10-04 12:03:41 +02:00
|
|
|
if (!is<DfgVertexVar>()) {
|
2022-10-21 11:50:02 +02:00
|
|
|
hash += m_type;
|
2025-07-14 23:09:34 +02:00
|
|
|
if (AstUnpackArrayDType* const adtypep = VN_CAST(dtypep(), UnpackArrayDType)) {
|
|
|
|
|
hash += adtypep->elementsConst();
|
|
|
|
|
// TODO: maybe include sub-dtype, but not hugely important at the moment
|
|
|
|
|
} else {
|
|
|
|
|
hash += width();
|
|
|
|
|
}
|
2022-10-06 12:26:11 +02:00
|
|
|
const auto pair = sourceEdges();
|
|
|
|
|
const DfgEdge* const edgesp = pair.first;
|
|
|
|
|
const size_t arity = pair.second;
|
|
|
|
|
// Sources must always be connected in well-formed graphs
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) hash += edgesp[i].m_sourcep->hash();
|
2022-09-27 14:50:37 +02:00
|
|
|
}
|
2022-10-06 12:26:11 +02:00
|
|
|
result = hash;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uint32_t DfgVertex::fanout() const {
|
|
|
|
|
uint32_t result = 0;
|
|
|
|
|
forEachSinkEdge([&](const DfgEdge&) { ++result; });
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
DfgVertexVar* DfgVertex::getResultVar() {
|
2025-06-16 13:25:44 +02:00
|
|
|
// It's easy if the vertex is already a variable ...
|
2025-07-14 23:09:34 +02:00
|
|
|
if (DfgVertexVar* const varp = this->cast<DfgVertexVar>()) return varp;
|
2025-06-16 13:25:44 +02:00
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
// Inspect existing variables written by this vertex, and choose one
|
|
|
|
|
DfgVertexVar* resp = nullptr;
|
2025-06-28 18:29:41 +02:00
|
|
|
// cppcheck-has-bug-suppress constParameter
|
2025-06-16 13:25:44 +02:00
|
|
|
this->forEachSink([&resp](DfgVertex& sink) {
|
2025-07-14 23:09:34 +02:00
|
|
|
DfgVertexVar* const varp = sink.cast<DfgVertexVar>();
|
2025-06-16 13:25:44 +02:00
|
|
|
if (!varp) return;
|
|
|
|
|
// First variable found
|
|
|
|
|
if (!resp) {
|
|
|
|
|
resp = varp;
|
|
|
|
|
return;
|
|
|
|
|
}
|
2025-08-05 11:24:54 +02:00
|
|
|
|
2025-06-16 13:25:44 +02:00
|
|
|
// Prefer those variables that must be kept anyway
|
2025-08-05 11:24:54 +02:00
|
|
|
if (resp->hasExtRefs() != varp->hasExtRefs()) {
|
|
|
|
|
if (!resp->hasExtRefs()) resp = varp;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (resp->hasModWrRefs() != varp->hasModWrRefs()) {
|
|
|
|
|
if (!resp->hasModWrRefs()) resp = varp;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (resp->hasDfgRefs() != varp->hasDfgRefs()) {
|
|
|
|
|
if (!resp->hasDfgRefs()) resp = varp;
|
2025-06-16 13:25:44 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// Prefer those that already have module references
|
2025-08-05 11:24:54 +02:00
|
|
|
if (resp->hasModRdRefs() != varp->hasModRdRefs()) {
|
|
|
|
|
if (!resp->hasModRdRefs()) resp = varp;
|
2025-06-16 13:25:44 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// Prefer the earlier one in source order
|
|
|
|
|
const FileLine& oldFlp = *(resp->fileline());
|
|
|
|
|
const FileLine& newFlp = *(varp->fileline());
|
|
|
|
|
if (const int cmp = oldFlp.operatorCompare(newFlp)) {
|
|
|
|
|
if (cmp > 0) resp = varp;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// Prefer the one with the lexically smaller name
|
2025-07-01 23:55:08 +02:00
|
|
|
if (const int cmp = resp->nodep()->name().compare(varp->nodep()->name())) {
|
2025-06-16 13:25:44 +02:00
|
|
|
if (cmp > 0) resp = varp;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// 'resp' and 'varp' are all the same, keep using the existing 'resp'
|
|
|
|
|
});
|
|
|
|
|
return resp;
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
AstScope* DfgVertex::scopep(ScopeCache& cache, bool tryResultVar) VL_MT_DISABLED {
|
|
|
|
|
// If this is a variable, we are done
|
|
|
|
|
if (DfgVertexVar* const varp = this->cast<DfgVertexVar>()) return varp->varScopep()->scopep();
|
|
|
|
|
|
|
|
|
|
// Try the result var first if instructed (usully only in the recursive case)
|
|
|
|
|
if (tryResultVar) {
|
|
|
|
|
if (DfgVertexVar* const varp = this->getResultVar()) return varp->varScopep()->scopep();
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-24 16:31:31 +02:00
|
|
|
// Note: the recursive invocation can cause a re-hash but that will not invalidate references
|
|
|
|
|
AstScope*& resultr = cache[this];
|
|
|
|
|
if (!resultr) {
|
|
|
|
|
// Mark to prevent infinite recursion on circular graphs - should never be called on such
|
|
|
|
|
resultr = reinterpret_cast<AstScope*>(1);
|
2025-07-01 23:55:08 +02:00
|
|
|
// Find scope based on sources, falling back on the root scope
|
|
|
|
|
AstScope* const rootp = v3Global.rootp()->topScopep()->scopep();
|
|
|
|
|
AstScope* foundp = rootp;
|
|
|
|
|
const auto edges = sourceEdges();
|
|
|
|
|
for (size_t i = 0; i < edges.second; ++i) {
|
|
|
|
|
DfgEdge& edge = edges.first[i];
|
|
|
|
|
foundp = edge.sourcep()->scopep(cache, true);
|
|
|
|
|
if (foundp != rootp) break;
|
|
|
|
|
}
|
2025-07-24 16:31:31 +02:00
|
|
|
resultr = foundp;
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-24 16:31:31 +02:00
|
|
|
// Die on a graph circular through operation vertices
|
|
|
|
|
UASSERT_OBJ(resultr != reinterpret_cast<AstScope*>(1), this,
|
2025-07-01 23:55:08 +02:00
|
|
|
"DfgVertex::scopep called on graph with circular operations");
|
|
|
|
|
|
|
|
|
|
// Done
|
2025-07-24 16:31:31 +02:00
|
|
|
return resultr;
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void DfgVertex::unlinkDelete(DfgGraph& dfg) {
|
|
|
|
|
// Unlink source edges
|
|
|
|
|
forEachSourceEdge([](DfgEdge& edge, size_t) { edge.unlinkSource(); });
|
|
|
|
|
// Unlink sink edges
|
|
|
|
|
forEachSinkEdge([](DfgEdge& edge) { edge.unlinkSource(); });
|
|
|
|
|
// Remove from graph
|
|
|
|
|
dfg.removeVertex(*this);
|
|
|
|
|
// Delete
|
|
|
|
|
delete this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::replaceWith(DfgVertex* newSorucep) {
|
|
|
|
|
while (m_sinksp) m_sinksp->relinkSource(newSorucep);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Vertex classes
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2022-10-06 19:34:18 +02:00
|
|
|
// DfgConst ----------
|
|
|
|
|
|
|
|
|
|
bool DfgConst::selfEquals(const DfgVertex& that) const {
|
2022-10-07 16:44:14 +02:00
|
|
|
return num().isCaseEq(that.as<DfgConst>()->num());
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-07 16:44:14 +02:00
|
|
|
V3Hash DfgConst::selfHash() const { return num().toHash(); }
|
2022-10-06 19:34:18 +02:00
|
|
|
|
|
|
|
|
// DfgSel ----------
|
|
|
|
|
|
|
|
|
|
bool DfgSel::selfEquals(const DfgVertex& that) const { return lsb() == that.as<DfgSel>()->lsb(); }
|
|
|
|
|
|
|
|
|
|
V3Hash DfgSel::selfHash() const { return V3Hash{lsb()}; }
|
|
|
|
|
|
2025-07-14 23:09:34 +02:00
|
|
|
// DfgSpliceArray ----------
|
|
|
|
|
|
|
|
|
|
bool DfgSpliceArray::selfEquals(const DfgVertex& that) const {
|
|
|
|
|
const DfgSpliceArray* const thatp = that.as<DfgSpliceArray>();
|
|
|
|
|
const size_t arity = this->arity();
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) {
|
|
|
|
|
if (driverIndex(i) != thatp->driverIndex(i)) return false;
|
|
|
|
|
}
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3Hash DfgSpliceArray::selfHash() const {
|
|
|
|
|
V3Hash hash;
|
|
|
|
|
const size_t arity = this->arity();
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) hash += driverIndex(i);
|
|
|
|
|
return hash;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// DfgSplicePacked ----------
|
|
|
|
|
|
|
|
|
|
bool DfgSplicePacked::selfEquals(const DfgVertex& that) const {
|
|
|
|
|
const DfgSplicePacked* const thatp = that.as<DfgSplicePacked>();
|
|
|
|
|
const size_t arity = this->arity();
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) {
|
|
|
|
|
if (driverLsb(i) != thatp->driverLsb(i)) return false;
|
|
|
|
|
}
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3Hash DfgSplicePacked::selfHash() const {
|
|
|
|
|
V3Hash hash;
|
|
|
|
|
const size_t arity = this->arity();
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) hash += driverLsb(i);
|
|
|
|
|
return hash;
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-06 12:26:11 +02:00
|
|
|
// DfgVertexVar ----------
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-06 12:26:11 +02:00
|
|
|
bool DfgVertexVar::selfEquals(const DfgVertex& that) const {
|
2025-07-01 23:55:08 +02:00
|
|
|
UASSERT_OBJ(nodep()->type() == that.as<DfgVertexVar>()->nodep()->type(), this,
|
|
|
|
|
"Both DfgVertexVar should be scoped or unscoped");
|
|
|
|
|
UASSERT_OBJ(nodep() != that.as<DfgVertexVar>()->nodep(), this,
|
|
|
|
|
"There should only be one DfgVertexVar for a given AstVar or AstVarScope");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-06 12:26:11 +02:00
|
|
|
V3Hash DfgVertexVar::selfHash() const {
|
|
|
|
|
V3Hash hash;
|
2025-07-01 23:55:08 +02:00
|
|
|
hash += nodep()->name();
|
|
|
|
|
hash += varp()->varType();
|
2022-10-06 12:26:11 +02:00
|
|
|
return hash;
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgVisitor
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
#include "V3Dfg__gen_visitor_defns.h" // From ./astgen
|