Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Data flow graph (DFG) representation of logic
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// Copyright 2003-2022 by Wilson Snyder. This program is free software; you
|
|
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
|
|
|
|
#include "config_build.h"
|
|
|
|
|
#include "verilatedos.h"
|
|
|
|
|
|
|
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
|
|
|
|
|
#include "V3File.h"
|
|
|
|
|
|
|
|
|
|
#include <cctype>
|
|
|
|
|
#include <unordered_map>
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgGraph
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
DfgGraph::DfgGraph(AstModule& module, const string& name)
|
|
|
|
|
: m_modulep{&module}
|
|
|
|
|
, m_name{name} {}
|
|
|
|
|
|
|
|
|
|
DfgGraph::~DfgGraph() {
|
|
|
|
|
forEachVertex([](DfgVertex& vtxp) { delete &vtxp; });
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::addGraph(DfgGraph& other) {
|
|
|
|
|
other.forEachVertex([&](DfgVertex& vtx) {
|
|
|
|
|
other.removeVertex(vtx);
|
|
|
|
|
this->addVertex(vtx);
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool DfgGraph::sortTopologically(bool reverse) {
|
|
|
|
|
// Vertices in reverse topological order
|
|
|
|
|
std::vector<DfgVertex*> order;
|
|
|
|
|
|
|
|
|
|
// Markings for algorithm
|
|
|
|
|
enum class Mark : uint8_t { Scheduled, OnPath, Finished };
|
|
|
|
|
std::unordered_map<DfgVertex*, Mark> marks;
|
|
|
|
|
|
|
|
|
|
// Stack of nodes in depth first search. The second element of the pair is true if the vertex
|
|
|
|
|
// is on the current DFS path, and false if it's only scheduled for visitation.
|
|
|
|
|
std::vector<std::pair<DfgVertex*, bool>> stack;
|
|
|
|
|
|
|
|
|
|
// Schedule vertex for visitation
|
|
|
|
|
const auto scheudle = [&](DfgVertex& vtx) {
|
|
|
|
|
// Nothing to do if already finished
|
|
|
|
|
if (marks.emplace(&vtx, Mark::Scheduled).first->second == Mark::Finished) return;
|
|
|
|
|
// Otherwise scheule for visitation
|
|
|
|
|
stack.emplace_back(&vtx, false);
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// For each vertex (direct loop, so we can return early)
|
|
|
|
|
for (DfgVertex* vtxp = m_vertices.begin(); vtxp; vtxp = vtxp->m_verticesEnt.nextp()) {
|
|
|
|
|
// Initiate DFS from this vertex
|
|
|
|
|
scheudle(*vtxp);
|
|
|
|
|
while (!stack.empty()) {
|
|
|
|
|
// Pick up stack top
|
|
|
|
|
const auto pair = stack.back();
|
|
|
|
|
stack.pop_back();
|
|
|
|
|
DfgVertex* const currp = pair.first;
|
|
|
|
|
const bool onPath = pair.second;
|
|
|
|
|
Mark& mark = marks.at(currp);
|
|
|
|
|
|
|
|
|
|
if (onPath) { // Popped node on path
|
|
|
|
|
// Mark it as done
|
|
|
|
|
UASSERT_OBJ(mark == Mark::OnPath, currp, "DFS got lost");
|
|
|
|
|
mark = Mark::Finished;
|
|
|
|
|
// Add to order
|
|
|
|
|
order.push_back(currp);
|
|
|
|
|
} else { // Otherwise node was scheduled for visitation, so visit it
|
|
|
|
|
// If already finished, then nothing to do
|
|
|
|
|
if (mark == Mark::Finished) continue;
|
|
|
|
|
// If already on path, then not a DAG
|
|
|
|
|
if (mark == Mark::OnPath) return false;
|
|
|
|
|
// Push to path and mark as such
|
|
|
|
|
mark = Mark::OnPath;
|
|
|
|
|
stack.emplace_back(currp, true);
|
|
|
|
|
// Schedule children
|
|
|
|
|
currp->forEachSink(scheudle);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Move given vertex to end of vertex list
|
|
|
|
|
const auto reinsert = [this](DfgVertex& vtx) {
|
|
|
|
|
// Remove from current location
|
|
|
|
|
removeVertex(vtx);
|
|
|
|
|
// 'addVertex' appends to the end of the vertex list, so can do this in one loop
|
|
|
|
|
addVertex(vtx);
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Remember 'order' is in reverse topological order
|
|
|
|
|
if (!reverse) {
|
|
|
|
|
for (DfgVertex* vtxp : vlstd::reverse_view(order)) reinsert(*vtxp);
|
|
|
|
|
} else {
|
|
|
|
|
for (DfgVertex* vtxp : order) reinsert(*vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Done
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
std::vector<std::unique_ptr<DfgGraph>> DfgGraph::splitIntoComponents() {
|
|
|
|
|
size_t componentNumber = 0;
|
|
|
|
|
std::unordered_map<const DfgVertex*, unsigned> vertex2component;
|
|
|
|
|
|
|
|
|
|
forEachVertex([&](const DfgVertex& vtx) {
|
|
|
|
|
// If already assigned this vertex to a component, then continue
|
|
|
|
|
if (vertex2component.count(&vtx)) return;
|
|
|
|
|
|
|
|
|
|
// Work queue for depth first traversal starting from this vertex
|
|
|
|
|
std::vector<const DfgVertex*> queue{&vtx};
|
|
|
|
|
|
|
|
|
|
// Depth first traversal
|
|
|
|
|
while (!queue.empty()) {
|
|
|
|
|
// Pop next work item
|
|
|
|
|
const DfgVertex& item = *queue.back();
|
|
|
|
|
queue.pop_back();
|
|
|
|
|
|
|
|
|
|
// Mark vertex as belonging to current component (if it's not marked yet)
|
|
|
|
|
const bool isFirstEncounter = vertex2component.emplace(&item, componentNumber).second;
|
|
|
|
|
|
|
|
|
|
// If we have already visited this vertex during the traversal, then move on.
|
|
|
|
|
if (!isFirstEncounter) continue;
|
|
|
|
|
|
|
|
|
|
// Enqueue all sources and sinks of this vertex.
|
|
|
|
|
item.forEachSource([&](const DfgVertex& src) { queue.push_back(&src); });
|
|
|
|
|
item.forEachSink([&](const DfgVertex& dst) { queue.push_back(&dst); });
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Done with this component
|
|
|
|
|
++componentNumber;
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// Create the component graphs
|
|
|
|
|
std::vector<std::unique_ptr<DfgGraph>> results{componentNumber};
|
|
|
|
|
|
|
|
|
|
for (size_t i = 0; i < componentNumber; ++i) {
|
|
|
|
|
results[i].reset(new DfgGraph{*m_modulep, name() + "-component-" + cvtToStr(i)});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Move all vertices under the corresponding component graphs
|
|
|
|
|
forEachVertex([&](DfgVertex& vtx) {
|
|
|
|
|
this->removeVertex(vtx);
|
|
|
|
|
results[vertex2component[&vtx]]->addVertex(vtx);
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
UASSERT(size() == 0, "'this' DfgGraph should have been emptied");
|
|
|
|
|
|
|
|
|
|
return results;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::runToFixedPoint(std::function<bool(DfgVertex&)> f) {
|
|
|
|
|
bool changed;
|
|
|
|
|
const auto apply = [&](DfgVertex& vtx) -> void {
|
|
|
|
|
if (f(vtx)) changed = true;
|
|
|
|
|
};
|
|
|
|
|
while (true) {
|
|
|
|
|
// Do one pass over the graph.
|
|
|
|
|
changed = false;
|
|
|
|
|
forEachVertex(apply);
|
|
|
|
|
if (!changed) break;
|
|
|
|
|
// Do another pass in the opposite direction. Alternating directions reduces
|
|
|
|
|
// the pathological complexity with left/right leaning trees.
|
|
|
|
|
changed = false;
|
|
|
|
|
forEachVertexInReverse(apply);
|
|
|
|
|
if (!changed) break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static const string toDotId(const DfgVertex& vtx) { return '"' + cvtToHex(&vtx) + '"'; }
|
|
|
|
|
|
|
|
|
|
// Dump one DfgVertex in Graphviz format
|
|
|
|
|
static void dumpDotVertex(std::ostream& os, const DfgVertex& vtx) {
|
|
|
|
|
os << toDotId(vtx);
|
2022-09-27 01:06:50 +02:00
|
|
|
|
|
|
|
|
if (const DfgVarPacked* const varVtxp = vtx.cast<DfgVarPacked>()) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstVar* const varp = varVtxp->varp();
|
|
|
|
|
os << " [label=\"" << varp->name() << "\nW" << varVtxp->width() << " / F"
|
|
|
|
|
<< varVtxp->fanout() << '"';
|
2022-09-27 01:06:50 +02:00
|
|
|
|
|
|
|
|
if (varp->direction() == VDirection::INPUT) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=chartreuse2"; // Green
|
|
|
|
|
} else if (varp->direction() == VDirection::OUTPUT) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=cyan2"; // Cyan
|
|
|
|
|
} else if (varp->direction() == VDirection::INOUT) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=darkorchid2"; // Purple
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (varVtxp->hasExtRefs()) {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=box, style=filled, fillcolor=firebrick2"; // Red
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (varVtxp->hasModRefs()) {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=box, style=filled, fillcolor=gold2"; // Yellow
|
|
|
|
|
} else if (varVtxp->keep()) {
|
|
|
|
|
os << ", shape=box, style=filled, fillcolor=grey";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
|
|
|
|
os << ", shape=box";
|
|
|
|
|
}
|
2022-09-27 01:06:50 +02:00
|
|
|
os << "]" << endl;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (const DfgVarArray* const arrVtxp = vtx.cast<DfgVarArray>()) {
|
|
|
|
|
AstVar* const varp = arrVtxp->varp();
|
|
|
|
|
os << " [label=\"" << varp->name() << "[]\"";
|
|
|
|
|
if (varp->direction() == VDirection::INPUT) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=chartreuse2"; // Green
|
|
|
|
|
} else if (varp->direction() == VDirection::OUTPUT) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=cyan2"; // Cyan
|
|
|
|
|
} else if (varp->direction() == VDirection::INOUT) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=darkorchid2"; // Purple
|
|
|
|
|
} else if (arrVtxp->hasExtRefs()) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=firebrick2"; // Red
|
|
|
|
|
} else if (arrVtxp->hasModRefs()) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=gold2"; // Yellow
|
|
|
|
|
} else if (arrVtxp->keep()) {
|
|
|
|
|
os << ", shape=box3d, style=filled, fillcolor=grey";
|
|
|
|
|
} else {
|
|
|
|
|
os << ", shape=box3d";
|
|
|
|
|
}
|
|
|
|
|
os << "]" << endl;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (const DfgConst* const constVtxp = vtx.cast<DfgConst>()) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
const V3Number& num = constVtxp->constp()->num();
|
|
|
|
|
os << " [label=\"";
|
|
|
|
|
if (num.width() <= 32 && !num.isSigned()) {
|
|
|
|
|
const bool feedsSel = !constVtxp->findSink<DfgVertex>([](const DfgVertex& vtx) { //
|
2022-09-27 01:06:50 +02:00
|
|
|
return !vtx.is<DfgSel>() && !vtx.is<DfgArraySel>();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
});
|
|
|
|
|
if (feedsSel) {
|
|
|
|
|
os << num.toUInt();
|
|
|
|
|
} else {
|
|
|
|
|
os << constVtxp->width() << "'d" << num.toUInt() << "\n";
|
|
|
|
|
os << constVtxp->width() << "'h" << std::hex << num.toUInt() << std::dec;
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
os << num.ascii();
|
|
|
|
|
}
|
|
|
|
|
os << '"';
|
|
|
|
|
os << ", shape=plain";
|
2022-09-27 01:06:50 +02:00
|
|
|
os << "]" << endl;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
os << " [label=\"" << vtx.typeName() << "\nW" << vtx.width() << " / F" << vtx.fanout() << '"';
|
|
|
|
|
if (vtx.hasMultipleSinks()) {
|
|
|
|
|
os << ", shape=doublecircle";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
2022-09-27 01:06:50 +02:00
|
|
|
os << ", shape=circle";
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-09-27 01:06:50 +02:00
|
|
|
os << "]" << endl;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Dump one DfgEdge in Graphviz format
|
|
|
|
|
static void dumpDotEdge(std::ostream& os, const DfgEdge& edge, const string& headlabel) {
|
|
|
|
|
os << toDotId(*edge.sourcep()) << " -> " << toDotId(*edge.sinkp());
|
|
|
|
|
if (!headlabel.empty()) os << " [headlabel=\"" << headlabel << "\"]";
|
|
|
|
|
os << endl;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Dump one DfgVertex and all of its source DfgEdges in Graphviz format
|
|
|
|
|
static void dumpDotVertexAndSourceEdges(std::ostream& os, const DfgVertex& vtx) {
|
|
|
|
|
dumpDotVertex(os, vtx);
|
|
|
|
|
vtx.forEachSourceEdge([&](const DfgEdge& edge, size_t idx) { //
|
|
|
|
|
if (edge.sourcep()) {
|
|
|
|
|
string headLabel;
|
2022-09-25 17:03:15 +02:00
|
|
|
if (vtx.arity() > 1) headLabel = vtx.srcName(idx);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
dumpDotEdge(os, edge, headLabel);
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDot(std::ostream& os, const string& label) const {
|
|
|
|
|
// Header
|
|
|
|
|
os << "digraph dfg {" << endl;
|
|
|
|
|
os << "graph [label=\"" << name();
|
|
|
|
|
if (!label.empty()) os << "-" << label;
|
|
|
|
|
os << "\", labelloc=t, labeljust=l]" << endl;
|
|
|
|
|
os << "graph [rankdir=LR]" << endl;
|
|
|
|
|
|
|
|
|
|
// Emit all vertices
|
|
|
|
|
forEachVertex([&](const DfgVertex& vtx) { dumpDotVertexAndSourceEdges(os, vtx); });
|
|
|
|
|
|
|
|
|
|
// Footer
|
|
|
|
|
os << "}" << endl;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDotFile(const string& fileName, const string& label) const {
|
|
|
|
|
// This generates a file used by graphviz, https://www.graphviz.org
|
|
|
|
|
// "hardcoded" parameters:
|
|
|
|
|
const std::unique_ptr<std::ofstream> os{V3File::new_ofstream(fileName)};
|
|
|
|
|
if (os->fail()) v3fatal("Cannot write to file: " << fileName);
|
|
|
|
|
dumpDot(*os.get(), label);
|
|
|
|
|
os->close();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDotFilePrefixed(const string& label) const {
|
|
|
|
|
string fileName = name();
|
|
|
|
|
if (!label.empty()) fileName += "-" + label;
|
|
|
|
|
dumpDotFile(v3Global.debugFilename(fileName) + ".dot", label);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Dump upstream logic cone starting from given vertex
|
|
|
|
|
static void dumpDotUpstreamConeFromVertex(std::ostream& os, const DfgVertex& vtx) {
|
|
|
|
|
// Work queue for depth first traversal starting from this vertex
|
|
|
|
|
std::vector<const DfgVertex*> queue{&vtx};
|
|
|
|
|
|
|
|
|
|
// Set of already visited vertices
|
|
|
|
|
std::unordered_set<const DfgVertex*> visited;
|
|
|
|
|
|
|
|
|
|
// Depth first traversal
|
|
|
|
|
while (!queue.empty()) {
|
|
|
|
|
// Pop next work item
|
|
|
|
|
const DfgVertex* const itemp = queue.back();
|
|
|
|
|
queue.pop_back();
|
|
|
|
|
|
|
|
|
|
// Mark vertex as visited
|
|
|
|
|
const bool isFirstEncounter = visited.insert(itemp).second;
|
|
|
|
|
|
|
|
|
|
// If we have already visited this vertex during the traversal, then move on.
|
|
|
|
|
if (!isFirstEncounter) continue;
|
|
|
|
|
|
|
|
|
|
// Enqueue all sources of this vertex.
|
|
|
|
|
itemp->forEachSource([&](const DfgVertex& src) { queue.push_back(&src); });
|
|
|
|
|
|
|
|
|
|
// Emit this vertex and all of its source edges
|
|
|
|
|
dumpDotVertexAndSourceEdges(os, *itemp);
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// Emit all DfgVarPacked vertices that have external references driven by this vertex
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
vtx.forEachSink([&](const DfgVertex& dst) {
|
2022-09-27 01:06:50 +02:00
|
|
|
if (const DfgVarPacked* const varVtxp = dst.cast<DfgVarPacked>()) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (varVtxp->hasRefs()) dumpDotVertexAndSourceEdges(os, dst);
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// LCOV_EXCL_START // Debug function for developer use only
|
|
|
|
|
void DfgGraph::dumpDotUpstreamCone(const string& fileName, const DfgVertex& vtx,
|
|
|
|
|
const string& name) const {
|
|
|
|
|
// Open output file
|
|
|
|
|
const std::unique_ptr<std::ofstream> os{V3File::new_ofstream(fileName)};
|
|
|
|
|
if (os->fail()) v3fatal("Cannot write to file: " << fileName);
|
|
|
|
|
|
|
|
|
|
// Header
|
|
|
|
|
*os << "digraph dfg {" << endl;
|
|
|
|
|
if (!name.empty()) *os << "graph [label=\"" << name << "\", labelloc=t, labeljust=l]" << endl;
|
|
|
|
|
*os << "graph [rankdir=LR]" << endl;
|
|
|
|
|
|
|
|
|
|
// Dump the cone
|
|
|
|
|
dumpDotUpstreamConeFromVertex(*os, vtx);
|
|
|
|
|
|
|
|
|
|
// Footer
|
|
|
|
|
*os << "}" << endl;
|
|
|
|
|
|
|
|
|
|
// Done
|
|
|
|
|
os->close();
|
|
|
|
|
}
|
|
|
|
|
// LCOV_EXCL_STOP
|
|
|
|
|
|
|
|
|
|
void DfgGraph::dumpDotAllVarConesPrefixed(const string& label) const {
|
|
|
|
|
const string prefix = label.empty() ? name() + "-cone-" : name() + "-" + label + "-cone-";
|
|
|
|
|
forEachVertex([&](const DfgVertex& vtx) {
|
|
|
|
|
// Check if this vertex drives a variable referenced outside the DFG.
|
2022-09-27 01:06:50 +02:00
|
|
|
const DfgVarPacked* const sinkp
|
|
|
|
|
= vtx.findSink<DfgVarPacked>([](const DfgVarPacked& sink) { //
|
|
|
|
|
return sink.hasRefs();
|
|
|
|
|
});
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// We only dump cones driving an externally referenced variable
|
|
|
|
|
if (!sinkp) return;
|
|
|
|
|
|
|
|
|
|
// Open output file
|
|
|
|
|
const string coneName{prefix + sinkp->varp()->name()};
|
|
|
|
|
const string fileName{v3Global.debugFilename(coneName) + ".dot"};
|
|
|
|
|
const std::unique_ptr<std::ofstream> os{V3File::new_ofstream(fileName)};
|
|
|
|
|
if (os->fail()) v3fatal("Cannot write to file: " << fileName);
|
|
|
|
|
|
|
|
|
|
// Header
|
|
|
|
|
*os << "digraph dfg {" << endl;
|
|
|
|
|
*os << "graph [label=\"" << coneName << "\", labelloc=t, labeljust=l]" << endl;
|
|
|
|
|
*os << "graph [rankdir=LR]" << endl;
|
|
|
|
|
|
|
|
|
|
// Dump this cone
|
|
|
|
|
dumpDotUpstreamConeFromVertex(*os, vtx);
|
|
|
|
|
|
|
|
|
|
// Footer
|
|
|
|
|
*os << "}" << endl;
|
|
|
|
|
|
|
|
|
|
// Done with this logic cone
|
|
|
|
|
os->close();
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgEdge
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
void DfgEdge::unlinkSource() {
|
|
|
|
|
if (!m_sourcep) return;
|
|
|
|
|
#ifdef VL_DEBUG
|
|
|
|
|
{
|
|
|
|
|
DfgEdge* sinkp = m_sourcep->m_sinksp;
|
|
|
|
|
while (sinkp) {
|
|
|
|
|
if (sinkp == this) break;
|
|
|
|
|
sinkp = sinkp->m_nextp;
|
|
|
|
|
}
|
|
|
|
|
UASSERT(sinkp, "'m_sourcep' does not have this edge as sink");
|
|
|
|
|
}
|
|
|
|
|
#endif
|
|
|
|
|
// Relink pointers of predecessor and successor
|
|
|
|
|
if (m_prevp) m_prevp->m_nextp = m_nextp;
|
|
|
|
|
if (m_nextp) m_nextp->m_prevp = m_prevp;
|
|
|
|
|
// If head of list in source, update source's head pointer
|
|
|
|
|
if (m_sourcep->m_sinksp == this) m_sourcep->m_sinksp = m_nextp;
|
|
|
|
|
// Mark source as unconnected
|
|
|
|
|
m_sourcep = nullptr;
|
|
|
|
|
// Clear links. This is not strictly necessary, but might catch bugs.
|
|
|
|
|
m_prevp = nullptr;
|
|
|
|
|
m_nextp = nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgEdge::relinkSource(DfgVertex* newSourcep) {
|
|
|
|
|
// Unlink current source, if any
|
|
|
|
|
unlinkSource();
|
|
|
|
|
// Link new source
|
|
|
|
|
m_sourcep = newSourcep;
|
|
|
|
|
// Prepend to sink list in source
|
|
|
|
|
m_nextp = newSourcep->m_sinksp;
|
|
|
|
|
if (m_nextp) m_nextp->m_prevp = this;
|
|
|
|
|
newSourcep->m_sinksp = this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgVertex
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
DfgVertex::DfgVertex(DfgGraph& dfg, FileLine* flp, AstNodeDType* dtypep, DfgType type)
|
|
|
|
|
: m_filelinep{flp}
|
|
|
|
|
, m_dtypep{dtypep}
|
|
|
|
|
, m_type{type} {
|
|
|
|
|
dfg.addVertex(*this);
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
DfgVertex::~DfgVertex() {
|
|
|
|
|
// TODO: It would be best to intern these via AstTypeTable to save the effort
|
|
|
|
|
if (VN_IS(m_dtypep, UnpackArrayDType)) VL_DO_DANGLING(delete m_dtypep, m_dtypep);
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
bool DfgVertex::selfEquals(const DfgVertex& that) const {
|
|
|
|
|
return this->m_type == that.m_type && this->dtypep() == that.dtypep();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3Hash DfgVertex::selfHash() const { return V3Hash{m_type} + width(); }
|
|
|
|
|
|
|
|
|
|
bool DfgVertex::equals(const DfgVertex& that, EqualsCache& cache) const {
|
|
|
|
|
if (this == &that) return true;
|
|
|
|
|
if (!this->selfEquals(that)) return false;
|
|
|
|
|
|
|
|
|
|
const auto key = (this < &that) ? EqualsCache::key_type{this, &that} //
|
|
|
|
|
: EqualsCache::key_type{&that, this};
|
|
|
|
|
const auto pair = cache.emplace(key, true);
|
|
|
|
|
bool& result = pair.first->second;
|
|
|
|
|
if (pair.second) {
|
|
|
|
|
auto thisPair = this->sourceEdges();
|
|
|
|
|
const DfgEdge* const thisSrcEdgesp = thisPair.first;
|
|
|
|
|
const size_t thisArity = thisPair.second;
|
|
|
|
|
auto thatPair = that.sourceEdges();
|
|
|
|
|
const DfgEdge* const thatSrcEdgesp = thatPair.first;
|
|
|
|
|
const size_t thatArity = thatPair.second;
|
|
|
|
|
UASSERT_OBJ(thisArity == thatArity, this, "Same type vertices must have same arity!");
|
|
|
|
|
for (size_t i = 0; i < thisArity; ++i) {
|
|
|
|
|
const DfgVertex* const thisSrcVtxp = thisSrcEdgesp[i].m_sourcep;
|
|
|
|
|
const DfgVertex* const thatSrcVtxp = thatSrcEdgesp[i].m_sourcep;
|
|
|
|
|
if (thisSrcVtxp == thatSrcVtxp) continue;
|
|
|
|
|
if (!thisSrcVtxp || !thatSrcVtxp || !thisSrcVtxp->equals(*thatSrcVtxp, cache)) {
|
|
|
|
|
result = false;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3Hash DfgVertex::hash(HashCache& cache) const {
|
|
|
|
|
const auto pair = cache.emplace(this, V3Hash{});
|
|
|
|
|
V3Hash& result = pair.first->second;
|
|
|
|
|
if (pair.second) {
|
|
|
|
|
result += selfHash();
|
2022-09-27 14:50:37 +02:00
|
|
|
// Variables are defined by themselves, so there is no need to hash the sources. This
|
|
|
|
|
// enables sound hashing of graphs circular only through variables, which we rely on.
|
2022-09-27 01:06:50 +02:00
|
|
|
if (!is<DfgVarPacked>() && !is<DfgVarArray>()) {
|
2022-09-27 14:50:37 +02:00
|
|
|
forEachSource([&result, &cache](const DfgVertex& src) { result += src.hash(cache); });
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
uint32_t DfgVertex::fanout() const {
|
|
|
|
|
uint32_t result = 0;
|
|
|
|
|
forEachSinkEdge([&](const DfgEdge&) { ++result; });
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::unlinkDelete(DfgGraph& dfg) {
|
|
|
|
|
// Unlink source edges
|
|
|
|
|
forEachSourceEdge([](DfgEdge& edge, size_t) { edge.unlinkSource(); });
|
|
|
|
|
// Unlink sink edges
|
|
|
|
|
forEachSinkEdge([](DfgEdge& edge) { edge.unlinkSource(); });
|
|
|
|
|
// Remove from graph
|
|
|
|
|
dfg.removeVertex(*this);
|
|
|
|
|
// Delete
|
|
|
|
|
delete this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::replaceWith(DfgVertex* newSorucep) {
|
|
|
|
|
while (m_sinksp) m_sinksp->relinkSource(newSorucep);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Vertex classes
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// DfgVarPacked ----------
|
|
|
|
|
void DfgVarPacked::accept(DfgVisitor& visitor) { visitor.visit(this); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
bool DfgVarPacked::selfEquals(const DfgVertex& that) const {
|
|
|
|
|
if (const DfgVarPacked* otherp = that.cast<DfgVarPacked>()) {
|
2022-09-27 14:50:37 +02:00
|
|
|
UASSERT_OBJ(varp() != otherp->varp(), this,
|
2022-09-27 01:06:50 +02:00
|
|
|
"There should only be one DfgVarPacked for a given AstVar");
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
V3Hash DfgVarPacked::selfHash() const { return V3Hasher::uncachedHash(varp()); }
|
|
|
|
|
|
|
|
|
|
// DfgVarPacked ----------
|
|
|
|
|
void DfgVarArray::accept(DfgVisitor& visitor) { visitor.visit(this); }
|
|
|
|
|
|
|
|
|
|
bool DfgVarArray::selfEquals(const DfgVertex& that) const {
|
|
|
|
|
if (const DfgVarArray* otherp = that.cast<DfgVarArray>()) {
|
|
|
|
|
UASSERT_OBJ(varp() != otherp->varp(), this,
|
|
|
|
|
"There should only be one DfgVarArray for a given AstVar");
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3Hash DfgVarArray::selfHash() const { return V3Hasher::uncachedHash(varp()); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// DfgConst ----------
|
|
|
|
|
void DfgConst::accept(DfgVisitor& visitor) { visitor.visit(this); }
|
|
|
|
|
|
|
|
|
|
bool DfgConst::selfEquals(const DfgVertex& that) const {
|
|
|
|
|
if (const DfgConst* otherp = that.cast<DfgConst>()) {
|
|
|
|
|
return constp()->sameTree(otherp->constp());
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3Hash DfgConst::selfHash() const { return V3Hasher::uncachedHash(m_constp); }
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// DfgVisitor
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void DfgVisitor::visit(DfgVarPacked* vtxp) { visit(static_cast<DfgVertex*>(vtxp)); }
|
|
|
|
|
void DfgVisitor::visit(DfgVarArray* vtxp) { visit(static_cast<DfgVertex*>(vtxp)); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void DfgVisitor::visit(DfgConst* vtxp) { visit(static_cast<DfgVertex*>(vtxp)); }
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// 'astgen' generated definitions
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
#include "V3Dfg__gen_definitions.h"
|