Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Convert DfgGraph to AstModule
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// Copyright 2003-2022 by Wilson Snyder. This program is free software; you
|
|
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// Convert DfgGraph back to AstModule. We recursively construct AstNodeMath expressions for each
|
2022-09-27 01:06:50 +02:00
|
|
|
// DfgVertex which represents a storage location (e.g.: DfgVarPacked), or has multiple sinks
|
|
|
|
|
// without driving a storage location (and hence needs a temporary variable to duplication). The
|
|
|
|
|
// recursion stops when we reach a DfgVertex representing a storage location (e.g.: DfgVarPacked),
|
|
|
|
|
// or a vertex that that has multiple sinks (as these nodes will have a [potentially new temporary]
|
|
|
|
|
// corresponding// storage location). Redundant variables (those whose source vertex drives
|
|
|
|
|
// multiple variables) are eliminated when possible. Vertices driving multiple variables are
|
|
|
|
|
// rendered once, driving an arbitrarily (but deterministically) chosen canonical variable, and the
|
|
|
|
|
// corresponding redundant variables are assigned from the canonical variable.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
|
|
|
|
#include "config_build.h"
|
|
|
|
|
#include "verilatedos.h"
|
|
|
|
|
|
|
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
#include "V3DfgPasses.h"
|
|
|
|
|
#include "V3UniqueNames.h"
|
|
|
|
|
|
|
|
|
|
#include <algorithm>
|
|
|
|
|
#include <unordered_map>
|
|
|
|
|
|
|
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
|
|
|
|
|
|
// Create an AstNodeMath out of a DfgVertex. For most AstNodeMath subtypes, this can be done
|
|
|
|
|
// automatically. For the few special cases, we provide specializations below
|
|
|
|
|
template <typename Node, typename... Ops>
|
|
|
|
|
Node* makeNode(const DfgForAst<Node>* vtxp, Ops... ops) {
|
|
|
|
|
Node* const nodep = new Node{vtxp->fileline(), ops...};
|
|
|
|
|
UASSERT_OBJ(nodep->width() == static_cast<int>(vtxp->width()), vtxp,
|
|
|
|
|
"Incorrect width in AstNode created from DfgVertex "
|
|
|
|
|
<< vtxp->typeName() << ": " << nodep->width() << " vs " << vtxp->width());
|
|
|
|
|
return nodep;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//======================================================================
|
|
|
|
|
// Vertices needing special conversion
|
|
|
|
|
|
|
|
|
|
template <>
|
|
|
|
|
AstExtend* makeNode<AstExtend, AstNodeMath*>( //
|
|
|
|
|
const DfgExtend* vtxp, AstNodeMath* op1) {
|
|
|
|
|
return new AstExtend{vtxp->fileline(), op1, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
|
|
|
|
AstExtendS* makeNode<AstExtendS, AstNodeMath*>( //
|
|
|
|
|
const DfgExtendS* vtxp, AstNodeMath* op1) {
|
|
|
|
|
return new AstExtendS{vtxp->fileline(), op1, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
|
|
|
|
AstShiftL* makeNode<AstShiftL, AstNodeMath*, AstNodeMath*>( //
|
|
|
|
|
const DfgShiftL* vtxp, AstNodeMath* op1, AstNodeMath* op2) {
|
|
|
|
|
return new AstShiftL{vtxp->fileline(), op1, op2, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
|
|
|
|
AstShiftR* makeNode<AstShiftR, AstNodeMath*, AstNodeMath*>( //
|
|
|
|
|
const DfgShiftR* vtxp, AstNodeMath* op1, AstNodeMath* op2) {
|
|
|
|
|
return new AstShiftR{vtxp->fileline(), op1, op2, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
|
|
|
|
AstShiftRS* makeNode<AstShiftRS, AstNodeMath*, AstNodeMath*>( //
|
|
|
|
|
const DfgShiftRS* vtxp, AstNodeMath* op1, AstNodeMath* op2) {
|
|
|
|
|
return new AstShiftRS{vtxp->fileline(), op1, op2, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//======================================================================
|
|
|
|
|
// Currently unhandled nodes - see corresponding AstToDfg functions
|
|
|
|
|
// LCOV_EXCL_START
|
|
|
|
|
template <>
|
|
|
|
|
AstCCast* makeNode<AstCCast, AstNodeMath*>(const DfgCCast* vtxp, AstNodeMath*) {
|
|
|
|
|
vtxp->v3fatal("not implemented");
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
AstAtoN* makeNode<AstAtoN, AstNodeMath*>(const DfgAtoN* vtxp, AstNodeMath*) {
|
|
|
|
|
vtxp->v3fatal("not implemented");
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
AstCompareNN* makeNode<AstCompareNN, AstNodeMath*, AstNodeMath*>(const DfgCompareNN* vtxp,
|
|
|
|
|
AstNodeMath*, AstNodeMath*) {
|
|
|
|
|
vtxp->v3fatal("not implemented");
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
AstSliceSel* makeNode<AstSliceSel, AstNodeMath*, AstNodeMath*, AstNodeMath*>(
|
|
|
|
|
const DfgSliceSel* vtxp, AstNodeMath*, AstNodeMath*, AstNodeMath*) {
|
|
|
|
|
vtxp->v3fatal("not implemented");
|
|
|
|
|
}
|
|
|
|
|
// LCOV_EXCL_STOP
|
|
|
|
|
|
|
|
|
|
} // namespace
|
|
|
|
|
|
|
|
|
|
class DfgToAstVisitor final : DfgVisitor {
|
2022-09-27 01:06:50 +02:00
|
|
|
// NODE STATE
|
|
|
|
|
// AstVar::user1() bool: this is a temporary we are introducing
|
|
|
|
|
|
|
|
|
|
const VNUser1InUse m_inuser1;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// STATE
|
|
|
|
|
|
|
|
|
|
AstModule* const m_modp; // The parent/result module
|
|
|
|
|
V3DfgOptimizationContext& m_ctx; // The optimization context for stats
|
|
|
|
|
AstNodeMath* m_resultp = nullptr; // The result node of the current traversal
|
|
|
|
|
// Map from DfgVertex to the AstVar holding the value of that DfgVertex after conversion
|
|
|
|
|
std::unordered_map<const DfgVertex*, AstVar*> m_resultVars;
|
|
|
|
|
// Map from an AstVar, to the canonical AstVar that can be substituted for that AstVar
|
|
|
|
|
std::unordered_map<AstVar*, AstVar*> m_canonVars;
|
|
|
|
|
V3UniqueNames m_tmpNames{"_VdfgTmp"}; // For generating temporary names
|
|
|
|
|
DfgVertex::HashCache m_hashCache; // For caching hashes
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// Given a DfgVarPacked, return the canonical AstVar that can be used for this DfgVarPacked.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Also builds the m_canonVars map as a side effect.
|
2022-09-27 01:06:50 +02:00
|
|
|
AstVar* getCanonicalVar(const DfgVarPacked* vtxp) {
|
2022-09-25 17:03:15 +02:00
|
|
|
// If variable driven (at least partially) outside the DFG, then we have no choice
|
|
|
|
|
if (!vtxp->isDrivenFullyByDfg()) return vtxp->varp();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Look up map
|
|
|
|
|
const auto it = m_canonVars.find(vtxp->varp());
|
|
|
|
|
if (it != m_canonVars.end()) return it->second;
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
// Not known yet, compute it (for all vars driven fully from the same driver)
|
2022-09-27 01:06:50 +02:00
|
|
|
std::vector<const DfgVarPacked*> varps;
|
2022-09-25 17:03:15 +02:00
|
|
|
vtxp->source(0)->forEachSink([&](const DfgVertex& vtx) {
|
2022-09-27 01:06:50 +02:00
|
|
|
if (const DfgVarPacked* const varVtxp = vtx.cast<DfgVarPacked>()) {
|
2022-09-25 17:03:15 +02:00
|
|
|
if (varVtxp->isDrivenFullyByDfg()) varps.push_back(varVtxp);
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
});
|
2022-09-25 17:03:15 +02:00
|
|
|
UASSERT_OBJ(!varps.empty(), vtxp, "The input vtxp is always available");
|
2022-09-27 01:06:50 +02:00
|
|
|
std::stable_sort(varps.begin(), varps.end(),
|
|
|
|
|
[](const DfgVarPacked* ap, const DfgVarPacked* bp) {
|
|
|
|
|
if (ap->hasExtRefs() != bp->hasExtRefs()) return ap->hasExtRefs();
|
|
|
|
|
const FileLine& aFl = *(ap->fileline());
|
|
|
|
|
const FileLine& bFl = *(bp->fileline());
|
|
|
|
|
if (const int cmp = aFl.operatorCompare(bFl)) return cmp < 0;
|
|
|
|
|
return ap->varp()->name() < bp->varp()->name();
|
|
|
|
|
});
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstVar* const canonVarp = varps.front()->varp();
|
|
|
|
|
|
|
|
|
|
// Add results to map
|
2022-09-27 01:06:50 +02:00
|
|
|
for (const DfgVarPacked* const varp : varps) m_canonVars.emplace(varp->varp(), canonVarp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Return it
|
|
|
|
|
return canonVarp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Given a DfgVertex, return an AstVar that will hold the value of the given DfgVertex once we
|
|
|
|
|
// are done with converting this Dfg into Ast form.
|
|
|
|
|
AstVar* getResultVar(const DfgVertex* vtxp) {
|
|
|
|
|
const auto pair = m_resultVars.emplace(vtxp, nullptr);
|
|
|
|
|
AstVar*& varp = pair.first->second;
|
|
|
|
|
if (pair.second) {
|
2022-09-27 01:06:50 +02:00
|
|
|
// If this vertex is a DfgVarPacked, then we know the variable. If this node is not a
|
|
|
|
|
// DfgVarPacked, then first we try to find a DfgVarPacked driven by this node, and use
|
|
|
|
|
// that, otherwise we create a temporary
|
|
|
|
|
if (const DfgVarPacked* const thisDfgVarPackedp = vtxp->cast<DfgVarPacked>()) {
|
|
|
|
|
// This is a DfgVarPacked
|
|
|
|
|
varp = getCanonicalVar(thisDfgVarPackedp);
|
|
|
|
|
} else if (const DfgVarArray* const thisDfgVarArrayp = vtxp->cast<DfgVarArray>()) {
|
|
|
|
|
// This is a DfgVarArray
|
|
|
|
|
varp = thisDfgVarArrayp->varp();
|
|
|
|
|
} else if (const DfgVarPacked* const sinkDfgVarPackedp = vtxp->findSink<DfgVarPacked>(
|
|
|
|
|
[](const DfgVarPacked& var) { return var.isDrivenFullyByDfg(); })) {
|
|
|
|
|
// We found a DfgVarPacked driven fully by this node
|
|
|
|
|
varp = getCanonicalVar(sinkDfgVarPackedp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
2022-09-27 01:06:50 +02:00
|
|
|
// No DfgVarPacked driven fully by this node. Create a temporary.
|
2022-09-25 17:03:15 +02:00
|
|
|
// TODO: should we reuse parts when the AstVar is used as an rvalue?
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
const string name = m_tmpNames.get(vtxp->hash(m_hashCache).toString());
|
|
|
|
|
// Note: It is ok for these temporary variables to be always unsigned. They are
|
|
|
|
|
// read only by other expressions within the graph and all expressions interpret
|
|
|
|
|
// their operands based on the expression type, not the operand type.
|
|
|
|
|
AstNodeDType* const dtypep = v3Global.rootp()->findBitDType(
|
|
|
|
|
vtxp->width(), vtxp->width(), VSigning::UNSIGNED);
|
|
|
|
|
varp = new AstVar{vtxp->fileline(), VVarType::MODULETEMP, name, dtypep};
|
2022-09-27 01:06:50 +02:00
|
|
|
varp->user1(true); // Mark as temporary
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Add temporary AstVar to containing module
|
|
|
|
|
m_modp->addStmtsp(varp);
|
|
|
|
|
}
|
|
|
|
|
// Add to map
|
|
|
|
|
}
|
|
|
|
|
return varp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
AstNodeMath* convertDfgVertexToAstNodeMath(DfgVertex* vtxp) {
|
|
|
|
|
UASSERT_OBJ(!m_resultp, vtxp, "Result already computed");
|
|
|
|
|
iterate(vtxp);
|
|
|
|
|
UASSERT_OBJ(m_resultp, vtxp, "Missing result");
|
|
|
|
|
AstNodeMath* const resultp = m_resultp;
|
|
|
|
|
m_resultp = nullptr;
|
|
|
|
|
return resultp;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
bool inlineVertex(DfgVertex& vtx) {
|
|
|
|
|
// Inline vertices that drive only a single node, or are special
|
|
|
|
|
if (!vtx.hasMultipleSinks()) return true;
|
|
|
|
|
if (vtx.is<DfgConst>()) return true;
|
|
|
|
|
if (vtx.is<DfgVarPacked>()) return true;
|
|
|
|
|
if (vtx.is<DfgVarArray>()) return true;
|
|
|
|
|
if (const DfgArraySel* const selp = vtx.cast<DfgArraySel>()) {
|
|
|
|
|
return selp->bitp()->is<DfgConst>();
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstNodeMath* convertSource(DfgVertex* vtxp) {
|
2022-09-27 01:06:50 +02:00
|
|
|
if (inlineVertex(*vtxp)) {
|
|
|
|
|
// Inlined vertices are simply recursively converted
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
UASSERT_OBJ(vtxp->hasSinks(), vtxp, "Must have one sink: " << vtxp->typeName());
|
|
|
|
|
return convertDfgVertexToAstNodeMath(vtxp);
|
2022-09-27 01:06:50 +02:00
|
|
|
} else {
|
|
|
|
|
// Vertices that are not inlined need a variable, just return a reference
|
|
|
|
|
return new AstVarRef{vtxp->fileline(), getResultVar(vtxp), VAccess::READ};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void convertCanonicalVarDriver(const DfgVarPacked* dfgVarp) {
|
2022-09-25 17:03:15 +02:00
|
|
|
const auto wRef = [dfgVarp]() {
|
|
|
|
|
return new AstVarRef{dfgVarp->fileline(), dfgVarp->varp(), VAccess::WRITE};
|
|
|
|
|
};
|
|
|
|
|
if (dfgVarp->isDrivenFullyByDfg()) {
|
|
|
|
|
// Whole variable is driven. Render driver and assign directly to whole variable.
|
|
|
|
|
AstNodeMath* const rhsp = convertDfgVertexToAstNodeMath(dfgVarp->source(0));
|
|
|
|
|
addResultEquation(dfgVarp->driverFileLine(0), wRef(), rhsp);
|
|
|
|
|
} else {
|
|
|
|
|
// Variable is driven partially. Render each driver as a separate assignment.
|
|
|
|
|
dfgVarp->forEachSourceEdge([&](const DfgEdge& edge, size_t idx) {
|
|
|
|
|
UASSERT_OBJ(edge.sourcep(), dfgVarp, "Should have removed undriven sources");
|
|
|
|
|
// Render the rhs expression
|
|
|
|
|
AstNodeMath* const rhsp = convertDfgVertexToAstNodeMath(edge.sourcep());
|
|
|
|
|
// Create select LValue
|
|
|
|
|
FileLine* const flp = dfgVarp->driverFileLine(idx);
|
|
|
|
|
AstConst* const lsbp = new AstConst{flp, dfgVarp->driverLsb(idx)};
|
|
|
|
|
AstConst* const widthp = new AstConst{flp, edge.sourcep()->width()};
|
|
|
|
|
AstSel* const lhsp = new AstSel{flp, wRef(), lsbp, widthp};
|
|
|
|
|
// Add assignment of the value to the selected bits
|
|
|
|
|
addResultEquation(flp, lhsp, rhsp);
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void convertDuplicateVarDriver(const DfgVarPacked* dfgVarp, AstVar* canonVarp) {
|
2022-09-25 17:03:15 +02:00
|
|
|
const auto rRef = [canonVarp]() {
|
|
|
|
|
return new AstVarRef{canonVarp->fileline(), canonVarp, VAccess::READ};
|
|
|
|
|
};
|
|
|
|
|
const auto wRef = [dfgVarp]() {
|
|
|
|
|
return new AstVarRef{dfgVarp->fileline(), dfgVarp->varp(), VAccess::WRITE};
|
|
|
|
|
};
|
|
|
|
|
if (dfgVarp->isDrivenFullyByDfg()) {
|
|
|
|
|
// Whole variable is driven. Just assign from the canonical variable.
|
|
|
|
|
addResultEquation(dfgVarp->driverFileLine(0), wRef(), rRef());
|
|
|
|
|
} else {
|
|
|
|
|
// Variable is driven partially. Asign from parts of the canonical var.
|
|
|
|
|
dfgVarp->forEachSourceEdge([&](const DfgEdge& edge, size_t idx) {
|
|
|
|
|
UASSERT_OBJ(edge.sourcep(), dfgVarp, "Should have removed undriven sources");
|
|
|
|
|
// Create select LValue
|
|
|
|
|
FileLine* const flp = dfgVarp->driverFileLine(idx);
|
|
|
|
|
AstConst* const lsbp = new AstConst{flp, dfgVarp->driverLsb(idx)};
|
|
|
|
|
AstConst* const widthp = new AstConst{flp, edge.sourcep()->width()};
|
|
|
|
|
AstSel* const rhsp = new AstSel{flp, rRef(), lsbp, widthp->cloneTree(false)};
|
|
|
|
|
AstSel* const lhsp = new AstSel{flp, wRef(), lsbp->cloneTree(false), widthp};
|
|
|
|
|
// Add assignment of the value to the selected bits
|
|
|
|
|
addResultEquation(flp, lhsp, rhsp);
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void convertArrayDiver(const DfgVarArray* dfgVarp) {
|
|
|
|
|
// Variable is driven partially. Asign from parts of the canonical var.
|
|
|
|
|
dfgVarp->forEachSourceEdge([&](const DfgEdge& edge, size_t idx) {
|
|
|
|
|
UASSERT_OBJ(edge.sourcep(), dfgVarp, "Should have removed undriven sources");
|
|
|
|
|
// Render the rhs expression
|
|
|
|
|
AstNodeMath* const rhsp = convertDfgVertexToAstNodeMath(edge.sourcep());
|
|
|
|
|
// Create select LValue
|
|
|
|
|
FileLine* const flp = dfgVarp->driverFileLine(idx);
|
|
|
|
|
AstVarRef* const refp = new AstVarRef{flp, dfgVarp->varp(), VAccess::WRITE};
|
|
|
|
|
AstConst* const idxp = new AstConst{flp, dfgVarp->driverIndex(idx)};
|
|
|
|
|
AstArraySel* const lhsp = new AstArraySel{flp, refp, idxp};
|
|
|
|
|
// Add assignment of the value to the selected bits
|
|
|
|
|
addResultEquation(flp, lhsp, rhsp);
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
void addResultEquation(FileLine* flp, AstNode* lhsp, AstNode* rhsp) {
|
|
|
|
|
m_modp->addStmtsp(new AstAssignW{flp, lhsp, rhsp});
|
|
|
|
|
++m_ctx.m_resultEquations;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// VISITORS
|
|
|
|
|
void visit(DfgVertex* vtxp) override { // LCOV_EXCL_START
|
|
|
|
|
vtxp->v3fatal("Unhandled DfgVertex: " << vtxp->typeName());
|
|
|
|
|
} // LCOV_EXCL_STOP
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void visit(DfgVarPacked* vtxp) override {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
m_resultp = new AstVarRef{vtxp->fileline(), getCanonicalVar(vtxp), VAccess::READ};
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void visit(DfgVarArray* vtxp) override {
|
|
|
|
|
m_resultp = new AstVarRef{vtxp->fileline(), vtxp->varp(), VAccess::READ};
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgConst* vtxp) override { //
|
|
|
|
|
m_resultp = vtxp->constp()->cloneTree(false);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// The rest of the 'visit' methods are generated by 'astgen'
|
|
|
|
|
#include "V3Dfg__gen_dfg_to_ast.h"
|
|
|
|
|
|
|
|
|
|
// Constructor
|
|
|
|
|
explicit DfgToAstVisitor(DfgGraph& dfg, V3DfgOptimizationContext& ctx)
|
|
|
|
|
: m_modp{dfg.modulep()}
|
|
|
|
|
, m_ctx{ctx} {
|
|
|
|
|
// We can eliminate some variables completely
|
|
|
|
|
std::vector<AstVar*> redundantVarps;
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
// Convert vertices back to assignments
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
dfg.forEachVertex([&](DfgVertex& vtx) {
|
2022-09-27 01:06:50 +02:00
|
|
|
// Render packed variable assignments
|
|
|
|
|
if (const DfgVarPacked* const dfgVarp = vtx.cast<DfgVarPacked>()) {
|
|
|
|
|
// DfgVarPacked instances (these might be driving the given AstVar variable)
|
|
|
|
|
// If there is no driver (i.e.: this DfgVarPacked is an input to the Dfg), then
|
2022-09-25 17:03:15 +02:00
|
|
|
// nothing to do
|
|
|
|
|
if (!dfgVarp->isDrivenByDfg()) return;
|
2022-09-27 01:06:50 +02:00
|
|
|
// The driver of this DfgVarPacked might drive multiple variables. Only emit one
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// assignment from the driver to an arbitrarily chosen canonical variable, and
|
|
|
|
|
// assign the other variables from that canonical variable
|
|
|
|
|
AstVar* const canonVarp = getCanonicalVar(dfgVarp);
|
|
|
|
|
if (canonVarp == dfgVarp->varp()) {
|
|
|
|
|
// This is the canonical variable, so render the driver
|
2022-09-25 17:03:15 +02:00
|
|
|
convertCanonicalVarDriver(dfgVarp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else if (dfgVarp->keep()) {
|
2022-09-25 17:03:15 +02:00
|
|
|
// Not the canonical variable but it must be kept
|
|
|
|
|
convertDuplicateVarDriver(dfgVarp, canonVarp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} else {
|
|
|
|
|
// Not a canonical var, and it can be removed. We will replace all references
|
|
|
|
|
// to it with the canonical variable, and hence this can be removed.
|
|
|
|
|
redundantVarps.push_back(dfgVarp->varp());
|
|
|
|
|
++m_ctx.m_replacedVars;
|
|
|
|
|
}
|
2022-09-25 17:03:15 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// Render array variable assignments
|
|
|
|
|
if (const DfgVarArray* dfgVarp = vtx.cast<DfgVarArray>()) {
|
|
|
|
|
// If there is no driver, then there is nothing to do
|
|
|
|
|
if (!dfgVarp->isDrivenByDfg()) return;
|
|
|
|
|
// We don't canonicalize arrays, so just render the drivers
|
|
|
|
|
convertArrayDiver(dfgVarp);
|
|
|
|
|
//
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// If the vertex is known to be inlined, then nothing else to do
|
|
|
|
|
if (inlineVertex(vtx)) return;
|
2022-09-25 17:03:15 +02:00
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// Check if this uses a temporary, vs one of the vars rendered above
|
|
|
|
|
AstVar* const resultVarp = getResultVar(&vtx);
|
|
|
|
|
if (resultVarp->user1()) {
|
|
|
|
|
// We introduced a temporary for this DfgVertex
|
2022-09-25 17:03:15 +02:00
|
|
|
++m_ctx.m_intermediateVars;
|
2022-09-27 01:06:50 +02:00
|
|
|
FileLine* const flp = vtx.fileline();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Just render the logic
|
2022-09-25 17:03:15 +02:00
|
|
|
AstNodeMath* const rhsp = convertDfgVertexToAstNodeMath(&vtx);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// The lhs is a temporary
|
2022-09-27 01:06:50 +02:00
|
|
|
AstNodeMath* const lhsp = new AstVarRef{flp, resultVarp, VAccess::WRITE};
|
2022-09-25 17:03:15 +02:00
|
|
|
// Add assignment of the value to the variable
|
2022-09-27 01:06:50 +02:00
|
|
|
addResultEquation(flp, lhsp, rhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// Remap all references to point to the canonical variables, if one exists
|
|
|
|
|
VNDeleter deleter;
|
|
|
|
|
m_modp->foreach<AstVarRef>([&](AstVarRef* refp) {
|
|
|
|
|
// Any variable that is written outside the DFG will have itself as the canonical
|
|
|
|
|
// var, so need not be replaced, furthermore, if a variable is traced, we don't
|
|
|
|
|
// want to update the write ref we just created above, so we only replace read only
|
|
|
|
|
// references.
|
|
|
|
|
if (!refp->access().isReadOnly()) return;
|
|
|
|
|
const auto it = m_canonVars.find(refp->varp());
|
|
|
|
|
if (it == m_canonVars.end()) return;
|
|
|
|
|
if (it->second == refp->varp()) return;
|
|
|
|
|
refp->replaceWith(new AstVarRef{refp->fileline(), it->second, refp->access()});
|
|
|
|
|
deleter.pushDeletep(refp);
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// Remove redundant variables
|
|
|
|
|
for (AstVar* const varp : redundantVarps) varp->unlinkFrBack()->deleteTree();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
static AstModule* apply(DfgGraph& dfg, V3DfgOptimizationContext& ctx) {
|
|
|
|
|
return DfgToAstVisitor{dfg, ctx}.m_modp;
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
AstModule* V3DfgPasses::dfgToAst(DfgGraph& dfg, V3DfgOptimizationContext& ctx) {
|
|
|
|
|
return DfgToAstVisitor::apply(dfg, ctx);
|
|
|
|
|
}
|