Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Peephole optimizations over DfgGraph
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2023-01-01 16:18:39 +01:00
|
|
|
// Copyright 2003-2023 by Wilson Snyder. This program is free software; you
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// A pattern-matching based optimizer for DfgGraph. This is in some aspects similar to V3Const, but
|
|
|
|
|
// more powerful in that it does not care about ordering combinational statement. This is also less
|
|
|
|
|
// broadly applicable than V3Const, as it does not apply to procedural statements with sequential
|
|
|
|
|
// execution semantics.
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
2023-10-18 12:37:46 +02:00
|
|
|
#include "V3PchAstNoMT.h" // VL_MT_DISABLED_CODE_UNIT
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3DfgPeephole.h"
|
|
|
|
|
|
|
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
#include "V3DfgPasses.h"
|
|
|
|
|
#include "V3Stats.h"
|
|
|
|
|
|
|
|
|
|
#include <cctype>
|
|
|
|
|
|
|
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
|
|
|
|
V3DfgPeepholeContext::V3DfgPeepholeContext(const std::string& label)
|
|
|
|
|
: m_label{label} {
|
|
|
|
|
const auto checkEnabled = [this](VDfgPeepholePattern id) {
|
|
|
|
|
string str{id.ascii()};
|
|
|
|
|
std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { //
|
|
|
|
|
return c == '_' ? '-' : std::tolower(c);
|
|
|
|
|
});
|
|
|
|
|
m_enabled[id] = v3Global.opt.fDfgPeepholeEnabled(str);
|
|
|
|
|
};
|
|
|
|
|
#define OPTIMIZATION_CHECK_ENABLED(id, name) checkEnabled(VDfgPeepholePattern::id);
|
|
|
|
|
FOR_EACH_DFG_PEEPHOLE_OPTIMIZATION(OPTIMIZATION_CHECK_ENABLED)
|
|
|
|
|
#undef OPTIMIZATION_CHECK_ENABLED
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3DfgPeepholeContext::~V3DfgPeepholeContext() {
|
|
|
|
|
const auto emitStat = [this](VDfgPeepholePattern id) {
|
|
|
|
|
string str{id.ascii()};
|
|
|
|
|
std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { //
|
|
|
|
|
return c == '_' ? ' ' : std::tolower(c);
|
|
|
|
|
});
|
|
|
|
|
V3Stats::addStat("Optimizations, DFG " + m_label + " Peephole, " + str, m_count[id]);
|
|
|
|
|
};
|
|
|
|
|
#define OPTIMIZATION_EMIT_STATS(id, name) emitStat(VDfgPeepholePattern::id);
|
|
|
|
|
FOR_EACH_DFG_PEEPHOLE_OPTIMIZATION(OPTIMIZATION_EMIT_STATS)
|
|
|
|
|
#undef OPTIMIZATION_EMIT_STATS
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
// clang-format off
|
|
|
|
|
template <typename T_Reduction>
|
|
|
|
|
struct ReductionToBitwiseImpl {};
|
|
|
|
|
template <> struct ReductionToBitwiseImpl<DfgRedAnd> { using type = DfgAnd; };
|
|
|
|
|
template <> struct ReductionToBitwiseImpl<DfgRedOr> { using type = DfgOr; };
|
|
|
|
|
template <> struct ReductionToBitwiseImpl<DfgRedXor> { using type = DfgXor; };
|
2023-11-11 05:25:53 +01:00
|
|
|
template <typename T_Reduction>
|
|
|
|
|
using ReductionToBitwise = typename ReductionToBitwiseImpl<T_Reduction>::type;
|
2022-09-30 17:19:53 +02:00
|
|
|
|
|
|
|
|
template <typename T_Bitwise>
|
|
|
|
|
struct BitwiseToReductionImpl {};
|
|
|
|
|
template <> struct BitwiseToReductionImpl<DfgAnd> { using type = DfgRedAnd; };
|
|
|
|
|
template <> struct BitwiseToReductionImpl<DfgOr> { using type = DfgRedOr; };
|
|
|
|
|
template <> struct BitwiseToReductionImpl<DfgXor> { using type = DfgRedXor; };
|
2023-11-11 05:25:53 +01:00
|
|
|
template <typename T_Reduction>
|
|
|
|
|
using BitwiseToReduction = typename BitwiseToReductionImpl<T_Reduction>::type;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
|
template<typename Vertex> void foldOp(V3Number& out, const V3Number& src);
|
|
|
|
|
template <> void foldOp<DfgCLog2> (V3Number& out, const V3Number& src) { out.opCLog2(src); }
|
|
|
|
|
template <> void foldOp<DfgCountOnes> (V3Number& out, const V3Number& src) { out.opCountOnes(src); }
|
|
|
|
|
template <> void foldOp<DfgExtend> (V3Number& out, const V3Number& src) { out.opAssign(src); }
|
|
|
|
|
template <> void foldOp<DfgExtendS> (V3Number& out, const V3Number& src) { out.opExtendS(src, src.width()); }
|
|
|
|
|
template <> void foldOp<DfgLogNot> (V3Number& out, const V3Number& src) { out.opLogNot(src); }
|
|
|
|
|
template <> void foldOp<DfgNegate> (V3Number& out, const V3Number& src) { out.opNegate(src); }
|
|
|
|
|
template <> void foldOp<DfgNot> (V3Number& out, const V3Number& src) { out.opNot(src); }
|
|
|
|
|
template <> void foldOp<DfgOneHot> (V3Number& out, const V3Number& src) { out.opOneHot(src); }
|
|
|
|
|
template <> void foldOp<DfgOneHot0> (V3Number& out, const V3Number& src) { out.opOneHot0(src); }
|
|
|
|
|
template <> void foldOp<DfgRedAnd> (V3Number& out, const V3Number& src) { out.opRedAnd(src); }
|
|
|
|
|
template <> void foldOp<DfgRedOr> (V3Number& out, const V3Number& src) { out.opRedOr(src); }
|
|
|
|
|
template <> void foldOp<DfgRedXor> (V3Number& out, const V3Number& src) { out.opRedXor(src); }
|
|
|
|
|
|
|
|
|
|
template<typename Vertex> void foldOp(V3Number& out, const V3Number& lhs, const V3Number& rhs);
|
|
|
|
|
template <> void foldOp<DfgAdd> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opAdd(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgAnd> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opAnd(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgConcat> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opConcat(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgDiv> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opDiv(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgDivS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opDivS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgEq> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opEq(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGt> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGt(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGtS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGtS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGte> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGte(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGteS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGteS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogAnd> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogAnd(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogEq> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogEq(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogIf> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogIf(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogOr> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogOr(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLt> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLt(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLtS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLtS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLte> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLte(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLteS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLtS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgModDiv> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opModDiv(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgModDivS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opModDivS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgMul> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opMul(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgMulS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opMulS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgNeq> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opNeq(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgOr> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opOr(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPow> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPow(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPowSS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPowSS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPowSU> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPowSU(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPowUS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPowUS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgReplicate> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opRepl(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgShiftL> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opShiftL(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgShiftR> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opShiftR(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgShiftRS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opShiftRS(lhs, rhs, lhs.width()); }
|
|
|
|
|
template <> void foldOp<DfgSub> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opSub(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgXor> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opXor(lhs, rhs); }
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
// clang-format on
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class V3DfgPeephole final : public DfgVisitor {
|
|
|
|
|
|
|
|
|
|
// STATE
|
|
|
|
|
DfgGraph& m_dfg; // The DfgGraph being visited
|
|
|
|
|
V3DfgPeepholeContext& m_ctx; // The config structure
|
2022-11-04 16:50:20 +01:00
|
|
|
// bool m_changed = false; // Changed a vertex
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstNodeDType* const m_bitDType = DfgVertex::dtypeForWidth(1); // Common, so grab it up front
|
2022-11-04 16:50:20 +01:00
|
|
|
// Head of work list. Note that we want all next pointers in the list to be non-zero (including
|
|
|
|
|
// that of the last element). This allows as to do two important things: detect if an element
|
2022-12-03 00:46:38 +01:00
|
|
|
// is in the list by checking for a non-zero next pointer, and easy prefetching without
|
2022-11-04 16:50:20 +01:00
|
|
|
// conditionals. The 'this' pointer is a good sentinel as it is a valid memory address, and we
|
|
|
|
|
// can easily check for the end of the list.
|
|
|
|
|
DfgVertex* m_workListp = reinterpret_cast<DfgVertex*>(this);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
#define APPLYING(id) if (checkApplying(VDfgPeepholePattern::id))
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
bool checkApplying(VDfgPeepholePattern id) {
|
|
|
|
|
if (!m_ctx.m_enabled[id]) return false;
|
|
|
|
|
UINFO(9, "Applying DFG patten " << id.ascii() << endl);
|
|
|
|
|
++m_ctx.m_count[id];
|
2022-11-04 16:50:20 +01:00
|
|
|
// m_changed = true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2022-11-04 16:50:20 +01:00
|
|
|
void addToWorkList(DfgVertex* vtxp) {
|
|
|
|
|
// We only process actual operation vertices
|
|
|
|
|
if (vtxp->is<DfgConst>() || vtxp->is<DfgVertexVar>()) return;
|
|
|
|
|
// If already in work list then nothing to do
|
|
|
|
|
if (vtxp->getUser<DfgVertex*>()) return;
|
|
|
|
|
// Actually add to work list.
|
|
|
|
|
vtxp->setUser<DfgVertex*>(m_workListp);
|
|
|
|
|
m_workListp = vtxp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void addSourcesToWorkList(DfgVertex* vtxp) {
|
|
|
|
|
vtxp->forEachSource([&](DfgVertex& src) { addToWorkList(&src); });
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void addSinksToWorkList(DfgVertex* vtxp) {
|
|
|
|
|
vtxp->forEachSink([&](DfgVertex& src) { addToWorkList(&src); });
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void deleteVertex(DfgVertex* vtxp) {
|
|
|
|
|
// Add all sources to the work list
|
|
|
|
|
addSourcesToWorkList(vtxp);
|
|
|
|
|
// If in work list then we can't delete it just yet (as we can't remove from the middle of
|
|
|
|
|
// the work list), but it will be deleted when the work list is processed.
|
|
|
|
|
if (vtxp->getUser<DfgVertex*>()) return;
|
|
|
|
|
// Otherwise we can delete it now.
|
|
|
|
|
VL_DO_DANGLING(vtxp->unlinkDelete(m_dfg), vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void replace(DfgVertex* vtxp, DfgVertex* replacementp) {
|
|
|
|
|
// Add sinks of replaced vertex to the work list
|
|
|
|
|
addSinksToWorkList(vtxp);
|
|
|
|
|
// Add replacement to the work list
|
|
|
|
|
addToWorkList(replacementp);
|
|
|
|
|
// Replace vertex with the replacement
|
|
|
|
|
vtxp->replaceWith(replacementp);
|
|
|
|
|
// Vertex is now unused, so delete it
|
|
|
|
|
deleteVertex(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void modified(DfgVertex* vtxp) {
|
|
|
|
|
// Add sinks of modified vertex to the work list
|
|
|
|
|
addSinksToWorkList(vtxp);
|
|
|
|
|
// Add the modified vertex itself to the work list
|
|
|
|
|
addToWorkList(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Shorthand
|
|
|
|
|
static AstNodeDType* dtypeForWidth(uint32_t width) { return DfgVertex::dtypeForWidth(width); }
|
|
|
|
|
|
2022-10-07 16:44:14 +02:00
|
|
|
// Create a 32-bit DfgConst vertex
|
|
|
|
|
DfgConst* makeI32(FileLine* flp, uint32_t val) { return new DfgConst{m_dfg, flp, 32, val}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-07 16:44:14 +02:00
|
|
|
// Create a DfgConst vertex with the given width and value zero
|
|
|
|
|
DfgConst* makeZero(FileLine* flp, uint32_t width) { return new DfgConst{m_dfg, flp, width}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-11-04 16:50:20 +01:00
|
|
|
// Create a new vertex of the given type
|
|
|
|
|
template <typename Vertex, typename... Args>
|
|
|
|
|
Vertex* make(FileLine* flp, Args&&... args) {
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
static_assert(!std::is_same<DfgConst, Vertex>::value, "Use 'makeZero' instead");
|
|
|
|
|
static_assert(!std::is_base_of<DfgVertexVar, Vertex>::value, "Can't create variables");
|
|
|
|
|
// Create the new vertex
|
|
|
|
|
Vertex* const vtxp = new Vertex{m_dfg, flp, std::forward<Args>(args)...};
|
|
|
|
|
// Add to work list.
|
|
|
|
|
vtxp->template setUser<DfgVertex*>(m_workListp);
|
|
|
|
|
m_workListp = vtxp;
|
|
|
|
|
// Return new node
|
|
|
|
|
return vtxp;
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
// Constant fold unary vertex, return true if folded
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
bool foldUnary(Vertex* vtxp) {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertexUnary, Vertex>::value, "Must invoke on unary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
if (DfgConst* const srcp = vtxp->srcp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_UNARY) {
|
|
|
|
|
DfgConst* const resultp = makeZero(vtxp->fileline(), vtxp->width());
|
|
|
|
|
foldOp<Vertex>(resultp->num(), srcp->num());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, resultp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Constant fold binary vertex, return true if folded
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
bool foldBinary(Vertex* vtxp) {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Vertex>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
if (DfgConst* const lhsp = vtxp->lhsp()->template cast<DfgConst>()) {
|
|
|
|
|
if (DfgConst* const rhsp = vtxp->rhsp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_BINARY) {
|
|
|
|
|
DfgConst* const resultp = makeZero(vtxp->fileline(), vtxp->width());
|
|
|
|
|
foldOp<Vertex>(resultp->num(), lhsp->num(), rhsp->num());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, resultp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
// Rotate the expression tree rooted at 'vtxp' to the right ('vtxp->lhsp()' becomes root,
|
|
|
|
|
// producing a right-leaning tree). Warning: only valid for associative operations.
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
void rotateRight(Vertex* vtxp) {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Vertex>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
DfgVertexBinary* const ap = vtxp;
|
|
|
|
|
DfgVertexBinary* const bp = vtxp->lhsp()->template as<Vertex>();
|
|
|
|
|
UASSERT_OBJ(!bp->hasMultipleSinks(), vtxp, "Can't rotate a non-tree");
|
|
|
|
|
ap->replaceWith(bp);
|
|
|
|
|
ap->lhsp(bp->rhsp());
|
|
|
|
|
bp->rhsp(ap);
|
|
|
|
|
// Concatenation dtypes need to be fixed up, other associative nodes preserve types
|
|
|
|
|
if VL_CONSTEXPR_CXX17 (std::is_same<DfgConcat, Vertex>::value) {
|
|
|
|
|
ap->dtypep(dtypeForWidth(ap->lhsp()->width() + ap->rhsp()->width()));
|
|
|
|
|
bp->dtypep(dtypeForWidth(bp->lhsp()->width() + bp->rhsp()->width()));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Transformations that apply to all associative binary vertices.
|
|
|
|
|
// Returns true if vtxp was replaced.
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
bool associativeBinary(Vertex* vtxp) {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Vertex>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
DfgConst* const lConstp = lhsp->cast<DfgConst>();
|
|
|
|
|
DfgConst* const rConstp = rhsp->cast<DfgConst>();
|
|
|
|
|
|
|
|
|
|
if (lConstp && rConstp) {
|
|
|
|
|
APPLYING(FOLD_ASSOC_BINARY) {
|
|
|
|
|
DfgConst* const resultp = makeZero(flp, vtxp->width());
|
|
|
|
|
foldOp<Vertex>(resultp->num(), lConstp->num(), rConstp->num());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, resultp);
|
2022-10-05 11:36:28 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (lConstp) {
|
|
|
|
|
if (Vertex* const rVtxp = rhsp->cast<Vertex>()) {
|
|
|
|
|
if (DfgConst* const rlConstp = rVtxp->lhsp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_ASSOC_BINARY_LHS_OF_RHS) {
|
|
|
|
|
// Fold constants
|
|
|
|
|
const uint32_t width = std::is_same<DfgConcat, Vertex>::value
|
|
|
|
|
? lConstp->width() + rlConstp->width()
|
|
|
|
|
: vtxp->width();
|
|
|
|
|
DfgConst* const constp = makeZero(flp, width);
|
|
|
|
|
foldOp<Vertex>(constp->num(), lConstp->num(), rlConstp->num());
|
|
|
|
|
|
|
|
|
|
// Replace vertex
|
2022-10-17 16:03:30 +02:00
|
|
|
if (!rVtxp->hasMultipleSinks()) {
|
2022-10-05 11:36:28 +02:00
|
|
|
rVtxp->lhsp(constp);
|
|
|
|
|
rVtxp->dtypep(vtxp->dtypep());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, rVtxp);
|
|
|
|
|
return true;
|
2022-10-05 11:36:28 +02:00
|
|
|
} else {
|
2022-11-04 16:50:20 +01:00
|
|
|
Vertex* const resp = make<Vertex>(flp, vtxp->dtypep());
|
2022-10-05 11:36:28 +02:00
|
|
|
resp->lhsp(constp);
|
|
|
|
|
resp->rhsp(rVtxp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, resp);
|
|
|
|
|
return true;
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (rConstp) {
|
|
|
|
|
if (Vertex* const lVtxp = lhsp->cast<Vertex>()) {
|
|
|
|
|
if (DfgConst* const lrConstp = lVtxp->rhsp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_ASSOC_BINARY_RHS_OF_LHS) {
|
|
|
|
|
// Fold constants
|
|
|
|
|
const uint32_t width = std::is_same<DfgConcat, Vertex>::value
|
|
|
|
|
? lrConstp->width() + rConstp->width()
|
|
|
|
|
: vtxp->width();
|
|
|
|
|
DfgConst* const constp = makeZero(flp, width);
|
|
|
|
|
foldOp<Vertex>(constp->num(), lrConstp->num(), rConstp->num());
|
|
|
|
|
|
|
|
|
|
// Replace vertex
|
2022-10-17 16:03:30 +02:00
|
|
|
if (!lVtxp->hasMultipleSinks()) {
|
2022-10-05 11:36:28 +02:00
|
|
|
lVtxp->rhsp(constp);
|
|
|
|
|
lVtxp->dtypep(vtxp->dtypep());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, lVtxp);
|
|
|
|
|
return true;
|
2022-10-05 11:36:28 +02:00
|
|
|
} else {
|
2022-11-04 16:50:20 +01:00
|
|
|
Vertex* const resp = make<Vertex>(flp, vtxp->dtypep());
|
2022-10-05 11:36:28 +02:00
|
|
|
resp->lhsp(lVtxp->lhsp());
|
|
|
|
|
resp->rhsp(constp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, resp);
|
|
|
|
|
return true;
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Make associative trees right leaning to reduce pattern variations, and for better CSE
|
|
|
|
|
while (vtxp->lhsp()->template is<Vertex>() && !vtxp->lhsp()->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(RIGHT_LEANING_ASSOC) {
|
|
|
|
|
rotateRight(vtxp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-05 11:36:28 +02:00
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Transformations that apply to all commutative binary vertices
|
2022-10-04 12:03:41 +02:00
|
|
|
void commutativeBinary(DfgVertexBinary* vtxp) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const lhsp = vtxp->source<0>();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->source<1>();
|
|
|
|
|
// Ensure Const is on left-hand side to simplify other patterns
|
|
|
|
|
if (lhsp->is<DfgConst>()) return;
|
|
|
|
|
if (rhsp->is<DfgConst>()) {
|
|
|
|
|
APPLYING(SWAP_CONST_IN_COMMUTATIVE_BINARY) {
|
|
|
|
|
vtxp->lhsp(rhsp);
|
|
|
|
|
vtxp->rhsp(lhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// Ensure Not is on the left-hand side to simplify other patterns
|
|
|
|
|
if (lhsp->is<DfgNot>()) return;
|
|
|
|
|
if (rhsp->is<DfgNot>()) {
|
|
|
|
|
APPLYING(SWAP_NOT_IN_COMMUTATIVE_BINARY) {
|
|
|
|
|
vtxp->lhsp(rhsp);
|
|
|
|
|
vtxp->rhsp(lhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// If both sides are variable references, order the side in some defined way. This allows
|
|
|
|
|
// CSE to later merge 'a op b' with 'b op a'.
|
2022-10-04 12:03:41 +02:00
|
|
|
if (lhsp->is<DfgVertexVar>() && rhsp->is<DfgVertexVar>()) {
|
|
|
|
|
AstVar* const lVarp = lhsp->as<DfgVertexVar>()->varp();
|
|
|
|
|
AstVar* const rVarp = rhsp->as<DfgVertexVar>()->varp();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (lVarp->name() > rVarp->name()) {
|
|
|
|
|
APPLYING(SWAP_VAR_IN_COMMUTATIVE_BINARY) {
|
|
|
|
|
vtxp->lhsp(rhsp);
|
|
|
|
|
vtxp->rhsp(lhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Bitwise operation with one side Const, and the other side a Concat
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
bool tryPushBitwiseOpThroughConcat(Vertex* vtxp, DfgConst* constp, DfgConcat* concatp) {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(constp->dtypep() == concatp->dtypep(), vtxp, "Mismatched widths");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
// If at least one of the sides of the Concat constant, or width 1 (i.e.: can be
|
|
|
|
|
// further simplified), then push the Vertex past the Concat
|
|
|
|
|
if (concatp->lhsp()->is<DfgConst>() || concatp->rhsp()->is<DfgConst>() //
|
2022-10-06 13:02:46 +02:00
|
|
|
|| concatp->lhsp()->dtypep() == m_bitDType
|
|
|
|
|
|| concatp->rhsp()->dtypep() == m_bitDType) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(PUSH_BITWISE_OP_THROUGH_CONCAT) {
|
|
|
|
|
const uint32_t width = concatp->width();
|
|
|
|
|
AstNodeDType* const lDtypep = concatp->lhsp()->dtypep();
|
|
|
|
|
AstNodeDType* const rDtypep = concatp->rhsp()->dtypep();
|
|
|
|
|
const uint32_t lWidth = lDtypep->width();
|
|
|
|
|
const uint32_t rWidth = rDtypep->width();
|
|
|
|
|
|
|
|
|
|
// The new Lhs vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
Vertex* const newLhsp = make<Vertex>(flp, lDtypep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgConst* const newLhsConstp = makeZero(constp->fileline(), lWidth);
|
|
|
|
|
newLhsConstp->num().opSel(constp->num(), width - 1, rWidth);
|
|
|
|
|
newLhsp->lhsp(newLhsConstp);
|
|
|
|
|
newLhsp->rhsp(concatp->lhsp());
|
|
|
|
|
|
|
|
|
|
// The new Rhs vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
Vertex* const newRhsp = make<Vertex>(flp, rDtypep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgConst* const newRhsConstp = makeZero(constp->fileline(), rWidth);
|
|
|
|
|
newRhsConstp->num().opSel(constp->num(), rWidth - 1, 0);
|
|
|
|
|
newRhsp->lhsp(newRhsConstp);
|
|
|
|
|
newRhsp->rhsp(concatp->rhsp());
|
|
|
|
|
|
|
|
|
|
// The replacement Concat vertex
|
|
|
|
|
DfgConcat* const newConcat
|
2022-11-04 16:50:20 +01:00
|
|
|
= make<DfgConcat>(concatp->fileline(), concatp->dtypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newConcat->lhsp(newLhsp);
|
|
|
|
|
newConcat->rhsp(newRhsp);
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, newConcat);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
bool tryPushCompareOpThroughConcat(Vertex* vtxp, DfgConst* constp, DfgConcat* concatp) {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(constp->dtypep() == concatp->dtypep(), vtxp, "Mismatched widths");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
// If at least one of the sides of the Concat is constant, then push the Vertex past the
|
|
|
|
|
// Concat
|
|
|
|
|
if (concatp->lhsp()->is<DfgConst>() || concatp->rhsp()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_COMPARE_OP_THROUGH_CONCAT) {
|
|
|
|
|
const uint32_t width = concatp->width();
|
|
|
|
|
const uint32_t lWidth = concatp->lhsp()->width();
|
|
|
|
|
const uint32_t rWidth = concatp->rhsp()->width();
|
|
|
|
|
|
|
|
|
|
// The new Lhs vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
Vertex* const newLhsp = make<Vertex>(flp, m_bitDType);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgConst* const newLhsConstp = makeZero(constp->fileline(), lWidth);
|
|
|
|
|
newLhsConstp->num().opSel(constp->num(), width - 1, rWidth);
|
|
|
|
|
newLhsp->lhsp(newLhsConstp);
|
|
|
|
|
newLhsp->rhsp(concatp->lhsp());
|
|
|
|
|
|
|
|
|
|
// The new Rhs vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
Vertex* const newRhsp = make<Vertex>(flp, m_bitDType);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgConst* const newRhsConstp = makeZero(constp->fileline(), rWidth);
|
|
|
|
|
newRhsConstp->num().opSel(constp->num(), rWidth - 1, 0);
|
|
|
|
|
newRhsp->lhsp(newRhsConstp);
|
|
|
|
|
newRhsp->rhsp(concatp->rhsp());
|
|
|
|
|
|
|
|
|
|
// The replacement Vertex
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgVertexBinary* const replacementp
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
= std::is_same<Vertex, DfgEq>::value
|
2022-11-04 16:50:20 +01:00
|
|
|
? make<DfgAnd>(concatp->fileline(), m_bitDType)
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
: nullptr;
|
|
|
|
|
UASSERT_OBJ(replacementp, vtxp,
|
|
|
|
|
"Unhandled vertex type in 'tryPushCompareOpThroughConcat': "
|
|
|
|
|
<< vtxp->typeName());
|
|
|
|
|
replacementp->relinkSource<0>(newLhsp);
|
|
|
|
|
replacementp->relinkSource<1>(newRhsp);
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
template <typename Bitwise>
|
|
|
|
|
bool tryPushBitwiseOpThroughReductions(Bitwise* vtxp) {
|
|
|
|
|
using Reduction = BitwiseToReduction<Bitwise>;
|
|
|
|
|
|
|
|
|
|
if (Reduction* const lRedp = vtxp->lhsp()->template cast<Reduction>()) {
|
|
|
|
|
if (Reduction* const rRedp = vtxp->rhsp()->template cast<Reduction>()) {
|
|
|
|
|
DfgVertex* const lSrcp = lRedp->srcp();
|
|
|
|
|
DfgVertex* const rSrcp = rRedp->srcp();
|
2022-10-04 19:26:53 +02:00
|
|
|
if (lSrcp->dtypep() == rSrcp->dtypep() && lSrcp->width() <= 64
|
|
|
|
|
&& !lSrcp->hasMultipleSinks() && !rSrcp->hasMultipleSinks()) {
|
2022-09-30 17:19:53 +02:00
|
|
|
APPLYING(PUSH_BITWISE_THROUGH_REDUCTION) {
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
2022-11-04 16:50:20 +01:00
|
|
|
Bitwise* const bwp = make<Bitwise>(flp, lSrcp->dtypep());
|
2022-09-30 17:19:53 +02:00
|
|
|
bwp->lhsp(lSrcp);
|
|
|
|
|
bwp->rhsp(rSrcp);
|
2022-11-04 16:50:20 +01:00
|
|
|
Reduction* const redp = make<Reduction>(flp, m_bitDType);
|
2022-09-30 17:19:53 +02:00
|
|
|
redp->srcp(bwp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, redp);
|
2022-09-30 17:19:53 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename Reduction>
|
|
|
|
|
void optimizeReduction(Reduction* vtxp) {
|
|
|
|
|
using Bitwise = ReductionToBitwise<Reduction>;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const srcp = vtxp->srcp();
|
2022-09-30 17:19:53 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
// Reduction of 1-bit value
|
2022-10-06 13:02:46 +02:00
|
|
|
if (srcp->dtypep() == m_bitDType) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REMOVE_WIDTH_ONE_REDUCTION) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, srcp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
if (DfgCond* const condp = srcp->cast<DfgCond>()) {
|
|
|
|
|
if (condp->thenp()->is<DfgConst>() || condp->elsep()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_REDUCTION_THROUGH_COND_WITH_CONST_BRANCH) {
|
|
|
|
|
// The new 'then' vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
Reduction* const newThenp = make<Reduction>(flp, m_bitDType);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newThenp->srcp(condp->thenp());
|
|
|
|
|
|
|
|
|
|
// The new 'else' vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
Reduction* const newElsep = make<Reduction>(flp, m_bitDType);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newElsep->srcp(condp->elsep());
|
|
|
|
|
|
|
|
|
|
// The replacement Cond vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(condp->fileline(), m_bitDType);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newCondp->condp(condp->condp());
|
|
|
|
|
newCondp->thenp(newThenp);
|
|
|
|
|
newCondp->elsep(newElsep);
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, newCondp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (DfgConcat* const concatp = srcp->cast<DfgConcat>()) {
|
2022-10-05 23:54:36 +02:00
|
|
|
if (concatp->lhsp()->is<DfgConst>() || concatp->rhsp()->is<DfgConst>()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
APPLYING(PUSH_REDUCTION_THROUGH_CONCAT) {
|
|
|
|
|
// Reduce the parts of the concatenation
|
2022-11-04 16:50:20 +01:00
|
|
|
Reduction* const lRedp = make<Reduction>(concatp->fileline(), m_bitDType);
|
2022-10-04 19:26:53 +02:00
|
|
|
lRedp->srcp(concatp->lhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
Reduction* const rRedp = make<Reduction>(concatp->fileline(), m_bitDType);
|
2022-10-04 19:26:53 +02:00
|
|
|
rRedp->srcp(concatp->rhsp());
|
|
|
|
|
|
|
|
|
|
// Bitwise reduce the results
|
2022-11-04 16:50:20 +01:00
|
|
|
Bitwise* const replacementp = make<Bitwise>(flp, m_bitDType);
|
2022-10-04 19:26:53 +02:00
|
|
|
replacementp->lhsp(lRedp);
|
|
|
|
|
replacementp->rhsp(rRedp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 12:03:41 +02:00
|
|
|
void optimizeShiftRHS(DfgVertexBinary* vtxp) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (const DfgConcat* const concatp = vtxp->rhsp()->cast<DfgConcat>()) {
|
|
|
|
|
if (concatp->lhsp()->isZero()) { // Drop redundant zero extension
|
2022-11-04 16:50:20 +01:00
|
|
|
APPLYING(REMOVE_REDUNDANT_ZEXT_ON_RHS_OF_SHIFT) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
vtxp->rhsp(concatp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// VISIT methods
|
|
|
|
|
|
|
|
|
|
void visit(DfgVertex*) override {}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexUnary
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
|
|
|
|
void visit(DfgCLog2* vtxp) override {
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgCountOnes* vtxp) override {
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgExtend* vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
UASSERT_OBJ(vtxp->width() > vtxp->srcp()->width(), vtxp, "Invalid zero extend");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
// Convert all Extend into Concat with zeros. This simplifies other patterns as they only
|
2022-09-30 17:19:53 +02:00
|
|
|
// need to handle Concat, which is more generic, and don't need special cases for
|
|
|
|
|
// Extend.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_EXTEND) {
|
2022-10-04 22:18:39 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgConcat* const replacementp = make<DfgConcat>(flp, vtxp->dtypep());
|
2022-10-04 22:18:39 +02:00
|
|
|
replacementp->lhsp(makeZero(flp, vtxp->width() - vtxp->srcp()->width()));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
replacementp->rhsp(vtxp->srcp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
|
|
|
|
return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
void visit(DfgExtendS* vtxp) override {
|
|
|
|
|
UASSERT_OBJ(vtxp->width() > vtxp->srcp()->width(), vtxp, "Invalid sign extend");
|
|
|
|
|
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLogNot* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == m_bitDType, vtxp, "Incorrect width");
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgNegate* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->srcp()->dtypep(), vtxp, "Mismatched width");
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgNot* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->srcp()->dtypep(), vtxp, "Mismatched width");
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Not of Cond
|
|
|
|
|
if (DfgCond* const condp = vtxp->srcp()->cast<DfgCond>()) {
|
|
|
|
|
// If at least one of the branches are a constant, push the Not past the Cond
|
|
|
|
|
if (condp->thenp()->is<DfgConst>() || condp->elsep()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_NOT_THROUGH_COND) {
|
|
|
|
|
// The new 'then' vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const newThenp = make<DfgNot>(vtxp->fileline(), vtxp->dtypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newThenp->srcp(condp->thenp());
|
|
|
|
|
|
|
|
|
|
// The new 'else' vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const newElsep = make<DfgNot>(vtxp->fileline(), vtxp->dtypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newElsep->srcp(condp->elsep());
|
|
|
|
|
|
|
|
|
|
// The replacement Cond vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(condp->fileline(), vtxp->dtypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
newCondp->condp(condp->condp());
|
|
|
|
|
newCondp->thenp(newThenp);
|
|
|
|
|
newCondp->elsep(newElsep);
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, newCondp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Not of Not
|
|
|
|
|
if (DfgNot* const notp = vtxp->srcp()->cast<DfgNot>()) {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == notp->srcp()->dtypep(), vtxp, "Width mismatch");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REMOVE_NOT_NOT) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, notp->srcp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!vtxp->srcp()->hasMultipleSinks()) {
|
|
|
|
|
// Not of Eq
|
|
|
|
|
if (DfgEq* const eqp = vtxp->srcp()->cast<DfgEq>()) {
|
|
|
|
|
APPLYING(REPLACE_NOT_EQ) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNeq* const replacementp = make<DfgNeq>(eqp->fileline(), vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
replacementp->lhsp(eqp->lhsp());
|
|
|
|
|
replacementp->rhsp(eqp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 19:26:53 +02:00
|
|
|
// Not of Neq
|
|
|
|
|
if (DfgNeq* const neqp = vtxp->srcp()->cast<DfgNeq>()) {
|
|
|
|
|
APPLYING(REPLACE_NOT_NEQ) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgEq* const replacementp = make<DfgEq>(neqp->fileline(), vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
replacementp->lhsp(neqp->lhsp());
|
|
|
|
|
replacementp->rhsp(neqp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
void visit(DfgOneHot* vtxp) override {
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
void visit(DfgOneHot0* vtxp) override {
|
|
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgRedOr* vtxp) override { optimizeReduction(vtxp); }
|
|
|
|
|
|
|
|
|
|
void visit(DfgRedAnd* vtxp) override { optimizeReduction(vtxp); }
|
|
|
|
|
|
|
|
|
|
void visit(DfgRedXor* vtxp) override { optimizeReduction(vtxp); }
|
|
|
|
|
|
2022-10-06 19:34:18 +02:00
|
|
|
void visit(DfgSel* vtxp) override {
|
|
|
|
|
DfgVertex* const fromp = vtxp->fromp();
|
|
|
|
|
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
const uint32_t lsb = vtxp->lsb();
|
|
|
|
|
const uint32_t width = vtxp->width();
|
|
|
|
|
const uint32_t msb = lsb + width - 1;
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const constp = fromp->cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_SEL) {
|
|
|
|
|
DfgConst* const replacementp = makeZero(flp, width);
|
|
|
|
|
replacementp->num().opSel(constp->num(), msb, lsb);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Full width select, replace with the source.
|
|
|
|
|
if (fromp->width() == width) {
|
|
|
|
|
UASSERT_OBJ(lsb == 0, fromp, "OOPS");
|
|
|
|
|
APPLYING(REMOVE_FULL_WIDTH_SEL) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, fromp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Concat
|
|
|
|
|
if (DfgConcat* const concatp = fromp->cast<DfgConcat>()) {
|
|
|
|
|
DfgVertex* const lhsp = concatp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = concatp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (msb < rhsp->width()) {
|
|
|
|
|
// If the select is entirely from rhs, then replace with sel from rhs
|
|
|
|
|
APPLYING(REMOVE_SEL_FROM_RHS_OF_CONCAT) { //
|
|
|
|
|
vtxp->fromp(rhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
} else if (lsb >= rhsp->width()) {
|
|
|
|
|
// If the select is entirely from the lhs, then replace with sel from lhs
|
|
|
|
|
APPLYING(REMOVE_SEL_FROM_LHS_OF_CONCAT) {
|
|
|
|
|
vtxp->fromp(lhsp);
|
|
|
|
|
vtxp->lsb(lsb - rhsp->width());
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
} else if (lsb == 0 || msb == concatp->width() - 1 //
|
|
|
|
|
|| lhsp->is<DfgConst>() || rhsp->is<DfgConst>() //
|
|
|
|
|
|| !concatp->hasMultipleSinks()) {
|
|
|
|
|
// If the select straddles both sides, but at least one of the sides is wholly
|
|
|
|
|
// selected, or at least one of the sides is a Const, or this concat has no other
|
|
|
|
|
// use, then push the Sel past the Concat
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_CONCAT) {
|
|
|
|
|
const uint32_t rSelWidth = rhsp->width() - lsb;
|
|
|
|
|
const uint32_t lSelWidth = width - rSelWidth;
|
|
|
|
|
|
|
|
|
|
// The new Lhs vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgSel* const newLhsp = make<DfgSel>(flp, dtypeForWidth(lSelWidth));
|
2022-10-06 19:34:18 +02:00
|
|
|
newLhsp->fromp(lhsp);
|
|
|
|
|
newLhsp->lsb(0);
|
|
|
|
|
|
|
|
|
|
// The new Rhs vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgSel* const newRhsp = make<DfgSel>(flp, dtypeForWidth(rSelWidth));
|
2022-10-06 19:34:18 +02:00
|
|
|
newRhsp->fromp(rhsp);
|
|
|
|
|
newRhsp->lsb(lsb);
|
|
|
|
|
|
|
|
|
|
// The replacement Concat vertex
|
|
|
|
|
DfgConcat* const newConcat
|
2022-11-04 16:50:20 +01:00
|
|
|
= make<DfgConcat>(concatp->fileline(), vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
newConcat->lhsp(newLhsp);
|
|
|
|
|
newConcat->rhsp(newRhsp);
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, newConcat);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgReplicate* const repp = fromp->cast<DfgReplicate>()) {
|
|
|
|
|
// If the Sel is wholly into the source of the Replicate, push the Sel through the
|
|
|
|
|
// Replicate and apply it directly to the source of the Replicate.
|
|
|
|
|
const uint32_t srcWidth = repp->srcp()->width();
|
|
|
|
|
if (width <= srcWidth) {
|
|
|
|
|
const uint32_t newLsb = lsb % srcWidth;
|
|
|
|
|
if (newLsb + width <= srcWidth) {
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_REPLICATE) {
|
|
|
|
|
vtxp->fromp(repp->srcp());
|
|
|
|
|
vtxp->lsb(newLsb);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Not
|
|
|
|
|
if (DfgNot* const notp = fromp->cast<DfgNot>()) {
|
|
|
|
|
// Replace "Sel from Not" with "Not of Sel"
|
|
|
|
|
if (!notp->hasMultipleSinks()) {
|
|
|
|
|
UASSERT_OBJ(notp->srcp()->dtypep() == notp->dtypep(), notp, "Mismatched widths");
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_NOT) {
|
|
|
|
|
// Make Sel select from source of Not
|
|
|
|
|
vtxp->fromp(notp->srcp());
|
|
|
|
|
// Add Not after Sel
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const replacementp = make<DfgNot>(notp->fileline(), vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
vtxp->replaceWith(replacementp);
|
|
|
|
|
replacementp->srcp(vtxp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Sel
|
|
|
|
|
if (DfgSel* const selp = fromp->cast<DfgSel>()) {
|
|
|
|
|
APPLYING(REPLACE_SEL_FROM_SEL) {
|
|
|
|
|
// Make this Sel select from the source of the source Sel
|
|
|
|
|
vtxp->fromp(selp->fromp());
|
|
|
|
|
// Adjust LSB
|
|
|
|
|
vtxp->lsb(lsb + selp->lsb());
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Cond
|
|
|
|
|
if (DfgCond* const condp = fromp->cast<DfgCond>()) {
|
|
|
|
|
// If at least one of the branches are a constant, push the select past the cond
|
|
|
|
|
if (condp->thenp()->is<DfgConst>() || condp->elsep()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_COND) {
|
|
|
|
|
// The new 'then' vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgSel* const newThenp = make<DfgSel>(flp, vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
newThenp->fromp(condp->thenp());
|
|
|
|
|
newThenp->lsb(lsb);
|
|
|
|
|
|
|
|
|
|
// The new 'else' vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgSel* const newElsep = make<DfgSel>(flp, vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
newElsep->fromp(condp->elsep());
|
|
|
|
|
newElsep->lsb(lsb);
|
|
|
|
|
|
|
|
|
|
// The replacement Cond vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(condp->fileline(), vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
newCondp->condp(condp->condp());
|
|
|
|
|
newCondp->thenp(newThenp);
|
|
|
|
|
newCondp->elsep(newElsep);
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, newCondp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from ShiftL
|
|
|
|
|
if (DfgShiftL* const shiftLp = fromp->cast<DfgShiftL>()) {
|
|
|
|
|
// If selecting bottom bits of left shift, push the Sel before the shift
|
|
|
|
|
if (lsb == 0) {
|
|
|
|
|
UASSERT_OBJ(shiftLp->lhsp()->width() >= width, vtxp, "input of shift narrow");
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_SHIFTL) {
|
|
|
|
|
vtxp->fromp(shiftLp->lhsp());
|
|
|
|
|
DfgShiftL* const newShiftLp
|
2022-11-04 16:50:20 +01:00
|
|
|
= make<DfgShiftL>(shiftLp->fileline(), vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
vtxp->replaceWith(newShiftLp);
|
|
|
|
|
newShiftLp->lhsp(vtxp);
|
|
|
|
|
newShiftLp->rhsp(shiftLp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexBinary - bitwise
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgAnd* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
commutativeBinary(vtxp);
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
// Bubble pushing
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!vtxp->hasMultipleSinks() && !lhsp->hasMultipleSinks() && !rhsp->hasMultipleSinks()) {
|
|
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const rhsNotp = rhsp->cast<DfgNot>()) {
|
|
|
|
|
APPLYING(REPLACE_AND_OF_NOT_AND_NOT) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgOr* const orp = make<DfgOr>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
orp->lhsp(lhsNotp->srcp());
|
|
|
|
|
orp->rhsp(rhsNotp->srcp());
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const notp = make<DfgNot>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
notp->srcp(orp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, notp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
2022-10-04 19:26:53 +02:00
|
|
|
if (DfgNeq* const rhsNeqp = rhsp->cast<DfgNeq>()) {
|
|
|
|
|
APPLYING(REPLACE_AND_OF_NOT_AND_NEQ) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgOr* const orp = make<DfgOr>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
orp->lhsp(lhsNotp->srcp());
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgEq* const newRhsp = make<DfgEq>(rhsp->fileline(), rhsp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
newRhsp->lhsp(rhsNeqp->lhsp());
|
|
|
|
|
newRhsp->rhsp(rhsNeqp->rhsp());
|
|
|
|
|
orp->rhsp(newRhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const notp = make<DfgNot>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
notp->srcp(orp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, notp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const lhsConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (lhsConstp->isZero()) {
|
|
|
|
|
APPLYING(REPLACE_AND_WITH_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, lhsConstp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (lhsConstp->isOnes()) {
|
|
|
|
|
APPLYING(REMOVE_AND_WITH_ONES) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, rhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (tryPushBitwiseOpThroughConcat(vtxp, lhsConstp, rhsConcatp)) return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (tryPushBitwiseOpThroughReductions(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
// ~A & A is all zeroes
|
|
|
|
|
if (lhsNotp->srcp() == rhsp) {
|
|
|
|
|
APPLYING(REPLACE_CONTRADICTORY_AND) {
|
|
|
|
|
DfgConst* const replacementp = makeZero(flp, vtxp->width());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgOr* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
commutativeBinary(vtxp);
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
// Bubble pushing
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!vtxp->hasMultipleSinks() && !lhsp->hasMultipleSinks() && !rhsp->hasMultipleSinks()) {
|
|
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const rhsNotp = rhsp->cast<DfgNot>()) {
|
|
|
|
|
APPLYING(REPLACE_OR_OF_NOT_AND_NOT) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgAnd* const andp = make<DfgAnd>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
andp->lhsp(lhsNotp->srcp());
|
|
|
|
|
andp->rhsp(rhsNotp->srcp());
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const notp = make<DfgNot>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
notp->srcp(andp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, notp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-10-04 19:26:53 +02:00
|
|
|
if (DfgNeq* const rhsNeqp = rhsp->cast<DfgNeq>()) {
|
|
|
|
|
APPLYING(REPLACE_OR_OF_NOT_AND_NEQ) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgAnd* const andp = make<DfgAnd>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
andp->lhsp(lhsNotp->srcp());
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgEq* const newRhsp = make<DfgEq>(rhsp->fileline(), rhsp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
newRhsp->lhsp(rhsNeqp->lhsp());
|
|
|
|
|
newRhsp->rhsp(rhsNeqp->rhsp());
|
|
|
|
|
andp->rhsp(newRhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const notp = make<DfgNot>(flp, vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
notp->srcp(andp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, notp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConcat* const lhsConcatp = lhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
2022-10-06 13:02:46 +02:00
|
|
|
if (lhsConcatp->lhsp()->dtypep() == rhsConcatp->lhsp()->dtypep()) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (lhsConcatp->lhsp()->isZero() && rhsConcatp->rhsp()->isZero()) {
|
|
|
|
|
APPLYING(REPLACE_OR_OF_CONCAT_ZERO_LHS_AND_CONCAT_RHS_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgConcat* const replacementp = make<DfgConcat>(flp, vtxp->dtypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
replacementp->lhsp(rhsConcatp->lhsp());
|
|
|
|
|
replacementp->rhsp(lhsConcatp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (lhsConcatp->rhsp()->isZero() && rhsConcatp->lhsp()->isZero()) {
|
|
|
|
|
APPLYING(REPLACE_OR_OF_CONCAT_LHS_ZERO_AND_CONCAT_ZERO_RHS) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgConcat* const replacementp = make<DfgConcat>(flp, vtxp->dtypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
replacementp->lhsp(lhsConcatp->lhsp());
|
|
|
|
|
replacementp->rhsp(rhsConcatp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const lhsConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (lhsConstp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_OR_WITH_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, rhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (lhsConstp->isOnes()) {
|
|
|
|
|
APPLYING(REPLACE_OR_WITH_ONES) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, lhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (tryPushBitwiseOpThroughConcat(vtxp, lhsConstp, rhsConcatp)) return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (tryPushBitwiseOpThroughReductions(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
// ~A | A is all ones
|
|
|
|
|
if (lhsNotp->srcp() == rhsp) {
|
|
|
|
|
APPLYING(REPLACE_TAUTOLOGICAL_OR) {
|
|
|
|
|
DfgConst* const replacementp = makeZero(flp, vtxp->width());
|
|
|
|
|
replacementp->num().setAllBits1();
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgXor* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
commutativeBinary(vtxp);
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
2022-09-30 17:19:53 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (DfgConst* const lConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (lConstp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_XOR_WITH_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, rhsp);
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (lConstp->isOnes()) {
|
|
|
|
|
APPLYING(REPLACE_XOR_WITH_ONES) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const replacementp = make<DfgNot>(flp, vtxp->dtypep());
|
2022-09-30 17:19:53 +02:00
|
|
|
replacementp->srcp(rhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (DfgConcat* const rConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
tryPushBitwiseOpThroughConcat(vtxp, lConstp, rConcatp);
|
|
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
|
|
|
|
|
if (tryPushBitwiseOpThroughReductions(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexBinary - other
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgAdd* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
commutativeBinary(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
void visit(DfgArraySel* vtxp) override {
|
|
|
|
|
if (DfgConst* const idxp = vtxp->bitp()->cast<DfgConst>()) {
|
|
|
|
|
if (DfgVarArray* const varp = vtxp->fromp()->cast<DfgVarArray>()) {
|
2022-10-28 18:00:32 +02:00
|
|
|
const size_t idx = idxp->toSizeT();
|
2022-10-04 22:18:39 +02:00
|
|
|
if (DfgVertex* const driverp = varp->driverAt(idx)) {
|
|
|
|
|
APPLYING(INLINE_ARRAYSEL) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, driverp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
void visit(DfgConcat* vtxp) override {
|
2022-10-05 11:36:28 +02:00
|
|
|
UASSERT_OBJ(vtxp->width() == vtxp->lhsp()->width() + vtxp->rhsp()->width(), vtxp,
|
|
|
|
|
"Inconsistent Concat");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-06 13:02:46 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (lhsp->isZero()) {
|
|
|
|
|
DfgConst* const lConstp = lhsp->as<DfgConst>();
|
|
|
|
|
if (DfgSel* const rSelp = rhsp->cast<DfgSel>()) {
|
2022-10-06 19:34:18 +02:00
|
|
|
if (vtxp->dtypep() == rSelp->fromp()->dtypep()
|
|
|
|
|
&& rSelp->lsb() == lConstp->width()) {
|
|
|
|
|
APPLYING(REPLACE_CONCAT_ZERO_AND_SEL_TOP_WITH_SHIFTR) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgShiftR* const replacementp = make<DfgShiftR>(flp, vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
replacementp->lhsp(rSelp->fromp());
|
|
|
|
|
replacementp->rhsp(makeI32(flp, lConstp->width()));
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (rhsp->isZero()) {
|
|
|
|
|
DfgConst* const rConstp = rhsp->as<DfgConst>();
|
|
|
|
|
if (DfgSel* const lSelp = lhsp->cast<DfgSel>()) {
|
2022-10-06 19:34:18 +02:00
|
|
|
if (vtxp->dtypep() == lSelp->fromp()->dtypep() && lSelp->lsb() == 0) {
|
|
|
|
|
APPLYING(REPLACE_CONCAT_SEL_BOTTOM_AND_ZERO_WITH_SHIFTL) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgShiftL* const replacementp = make<DfgShiftL>(flp, vtxp->dtypep());
|
2022-10-06 19:34:18 +02:00
|
|
|
replacementp->lhsp(lSelp->fromp());
|
|
|
|
|
replacementp->rhsp(makeI32(flp, rConstp->width()));
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (DfgNot* const lNot = lhsp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const rNot = rhsp->cast<DfgNot>()) {
|
|
|
|
|
if (!lNot->hasMultipleSinks() && !rNot->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(PUSH_CONCAT_THROUGH_NOTS) {
|
|
|
|
|
vtxp->lhsp(lNot->srcp());
|
|
|
|
|
vtxp->rhsp(rNot->srcp());
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const replacementp = make<DfgNot>(flp, vtxp->dtypep());
|
2022-10-05 11:36:28 +02:00
|
|
|
vtxp->replaceWith(replacementp);
|
|
|
|
|
replacementp->srcp(vtxp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-05 11:36:28 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
const auto joinSels = [this](DfgSel* lSelp, DfgSel* rSelp, FileLine* flp) -> DfgSel* {
|
2022-10-06 19:34:18 +02:00
|
|
|
if (lSelp->fromp()->equals(*rSelp->fromp())) {
|
|
|
|
|
if (lSelp->lsb() == rSelp->lsb() + rSelp->width()) {
|
|
|
|
|
// Two consecutive Sels, make a single Sel.
|
|
|
|
|
const uint32_t width = lSelp->width() + rSelp->width();
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgSel* const joinedSelp = make<DfgSel>(flp, dtypeForWidth(width));
|
2022-10-06 19:34:18 +02:00
|
|
|
joinedSelp->fromp(rSelp->fromp());
|
|
|
|
|
joinedSelp->lsb(rSelp->lsb());
|
|
|
|
|
return joinedSelp;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return nullptr;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
DfgSel* const lSelp = lhsp->cast<DfgSel>();
|
|
|
|
|
DfgSel* const rSelp = rhsp->cast<DfgSel>();
|
|
|
|
|
if (lSelp && rSelp) {
|
|
|
|
|
if (DfgSel* const jointSelp = joinSels(lSelp, rSelp, flp)) {
|
|
|
|
|
APPLYING(REMOVE_CONCAT_OF_ADJOINING_SELS) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, jointSelp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (lSelp) {
|
|
|
|
|
if (DfgConcat* const rConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (DfgSel* const rlSelp = rConcatp->lhsp()->cast<DfgSel>()) {
|
|
|
|
|
if (DfgSel* const jointSelp = joinSels(lSelp, rlSelp, flp)) {
|
|
|
|
|
APPLYING(REPLACE_NESTED_CONCAT_OF_ADJOINING_SELS_ON_LHS) {
|
|
|
|
|
DfgConcat* const replacementp
|
2022-11-04 16:50:20 +01:00
|
|
|
= make<DfgConcat>(flp, vtxp->dtypep());
|
2022-10-04 22:18:39 +02:00
|
|
|
replacementp->lhsp(jointSelp);
|
|
|
|
|
replacementp->rhsp(rConcatp->rhsp());
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (rSelp) {
|
|
|
|
|
if (DfgConcat* const lConcatp = lhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (DfgSel* const lrlSelp = lConcatp->rhsp()->cast<DfgSel>()) {
|
|
|
|
|
if (DfgSel* const jointSelp = joinSels(lrlSelp, rSelp, flp)) {
|
|
|
|
|
APPLYING(REPLACE_NESTED_CONCAT_OF_ADJOINING_SELS_ON_RHS) {
|
|
|
|
|
DfgConcat* const replacementp
|
2022-11-04 16:50:20 +01:00
|
|
|
= make<DfgConcat>(flp, vtxp->dtypep());
|
2022-10-04 22:18:39 +02:00
|
|
|
replacementp->lhsp(lConcatp->lhsp());
|
|
|
|
|
replacementp->rhsp(jointSelp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgDiv* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgDivS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgEq* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
|
|
|
|
|
commutativeBinary(vtxp);
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const lhsConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (tryPushCompareOpThroughConcat(vtxp, lhsConstp, rhsConcatp)) return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgGt* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgGtS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgGte* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgGteS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLogAnd* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLogEq* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLogIf* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLogOr* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLt* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLtS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLte* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgLteS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgModDiv* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgModDivS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgMul* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
2022-10-05 11:36:28 +02:00
|
|
|
|
|
|
|
|
if (associativeBinary(vtxp)) return;
|
|
|
|
|
|
|
|
|
|
commutativeBinary(vtxp);
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgMulS* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
2022-10-05 11:36:28 +02:00
|
|
|
|
|
|
|
|
if (associativeBinary(vtxp)) return;
|
|
|
|
|
|
|
|
|
|
commutativeBinary(vtxp);
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgNeq* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgPow* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgPowSS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgPowSU* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgPowUS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgReplicate* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
if (vtxp->dtypep() == vtxp->srcp()->dtypep()) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(REMOVE_REPLICATE_ONCE) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, vtxp->srcp());
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgShiftL* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
optimizeShiftRHS(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgShiftR* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
optimizeShiftRHS(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgShiftRS* vtxp) override {
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
optimizeShiftRHS(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgSub* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->lhsp()->dtypep(), vtxp, "Mismatched LHS width");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->rhsp()->dtypep(), vtxp, "Mismatched RHS width");
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const rConstp = rhsp->cast<DfgConst>()) {
|
|
|
|
|
if (rConstp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_SUB_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, lhsp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-28 18:00:32 +02:00
|
|
|
if (vtxp->dtypep() == m_bitDType && rConstp->hasValue(1)) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(REPLACE_SUB_WITH_NOT) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgNot* const replacementp = make<DfgNot>(vtxp->fileline(), m_bitDType);
|
2022-10-04 22:18:39 +02:00
|
|
|
replacementp->srcp(lhsp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, replacementp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexTernary
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgCond* vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->thenp()->dtypep(), vtxp, "Width mismatch");
|
|
|
|
|
UASSERT_OBJ(vtxp->dtypep() == vtxp->elsep()->dtypep(), vtxp, "Width mismatch");
|
2022-10-04 22:18:39 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const condp = vtxp->condp();
|
|
|
|
|
DfgVertex* const thenp = vtxp->thenp();
|
|
|
|
|
DfgVertex* const elsep = vtxp->elsep();
|
2022-10-04 22:18:39 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-06 13:02:46 +02:00
|
|
|
if (condp->dtypep() != m_bitDType) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
if (condp->isOnes()) {
|
|
|
|
|
APPLYING(REMOVE_COND_WITH_TRUE_CONDITION) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, thenp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (condp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_COND_WITH_FALSE_CONDITION) {
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, elsep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgNot* const condNotp = condp->cast<DfgNot>()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!condp->hasMultipleSinks() || condNotp->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(SWAP_COND_WITH_NOT_CONDITION) {
|
|
|
|
|
vtxp->condp(condNotp->srcp());
|
|
|
|
|
vtxp->thenp(elsep);
|
|
|
|
|
vtxp->elsep(thenp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgNeq* const condNeqp = condp->cast<DfgNeq>()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!condp->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(SWAP_COND_WITH_NEQ_CONDITION) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgEq* const newCondp = make<DfgEq>(condp->fileline(), condp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
newCondp->lhsp(condNeqp->lhsp());
|
|
|
|
|
newCondp->rhsp(condNeqp->rhsp());
|
|
|
|
|
vtxp->condp(newCondp);
|
|
|
|
|
vtxp->thenp(elsep);
|
|
|
|
|
vtxp->elsep(thenp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgNot* const thenNotp = thenp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const elseNotp = elsep->cast<DfgNot>()) {
|
2022-11-04 16:50:20 +01:00
|
|
|
if (!thenNotp->srcp()->is<DfgConst>() && !elseNotp->srcp()->is<DfgConst>()
|
2022-11-22 14:48:49 +01:00
|
|
|
&& !thenNotp->hasMultipleSinks() && !elseNotp->hasMultipleSinks()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
APPLYING(PULL_NOTS_THROUGH_COND) {
|
|
|
|
|
DfgNot* const replacementp
|
2022-11-04 16:50:20 +01:00
|
|
|
= make<DfgNot>(thenp->fileline(), vtxp->dtypep());
|
2022-10-04 19:26:53 +02:00
|
|
|
vtxp->thenp(thenNotp->srcp());
|
|
|
|
|
vtxp->elsep(elseNotp->srcp());
|
|
|
|
|
vtxp->replaceWith(replacementp);
|
|
|
|
|
replacementp->srcp(vtxp);
|
2022-11-04 16:50:20 +01:00
|
|
|
modified(vtxp);
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (vtxp->width() > 1) {
|
|
|
|
|
// 'cond ? a + 1 : a' -> 'a + cond'
|
|
|
|
|
if (DfgAdd* const thenAddp = thenp->cast<DfgAdd>()) {
|
|
|
|
|
if (DfgConst* const constp = thenAddp->lhsp()->cast<DfgConst>()) {
|
2022-10-28 18:00:32 +02:00
|
|
|
if (constp->hasValue(1)) {
|
2022-09-30 17:19:53 +02:00
|
|
|
if (thenAddp->rhsp() == elsep) {
|
|
|
|
|
APPLYING(REPLACE_COND_INC) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgConcat* const extp = make<DfgConcat>(flp, vtxp->dtypep());
|
2022-09-30 17:19:53 +02:00
|
|
|
extp->rhsp(condp);
|
|
|
|
|
extp->lhsp(makeZero(flp, vtxp->width() - 1));
|
|
|
|
|
FileLine* const thenFlp = thenAddp->fileline();
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgAdd* const addp = make<DfgAdd>(thenFlp, vtxp->dtypep());
|
2022-09-30 17:19:53 +02:00
|
|
|
addp->lhsp(thenAddp->rhsp());
|
|
|
|
|
addp->rhsp(extp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, addp);
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// 'cond ? a - 1 : a' -> 'a - cond'
|
|
|
|
|
if (DfgSub* const thenSubp = thenp->cast<DfgSub>()) {
|
|
|
|
|
if (DfgConst* const constp = thenSubp->rhsp()->cast<DfgConst>()) {
|
2022-10-28 18:00:32 +02:00
|
|
|
if (constp->hasValue(1)) {
|
2022-09-30 17:19:53 +02:00
|
|
|
if (thenSubp->lhsp() == elsep) {
|
|
|
|
|
APPLYING(REPLACE_COND_DEC) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgConcat* const extp = make<DfgConcat>(flp, vtxp->dtypep());
|
2022-09-30 17:19:53 +02:00
|
|
|
extp->rhsp(condp);
|
|
|
|
|
extp->lhsp(makeZero(flp, vtxp->width() - 1));
|
|
|
|
|
FileLine* const thenFlp = thenSubp->fileline();
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgSub* const subp = make<DfgSub>(thenFlp, vtxp->dtypep());
|
2022-09-30 17:19:53 +02:00
|
|
|
subp->lhsp(thenSubp->lhsp());
|
|
|
|
|
subp->rhsp(extp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, subp);
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-06 13:02:46 +02:00
|
|
|
if (vtxp->dtypep() == m_bitDType) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstNodeDType* const dtypep = vtxp->dtypep();
|
|
|
|
|
if (thenp->isZero()) { // a ? 0 : b becomes ~a & b
|
|
|
|
|
APPLYING(REPLACE_COND_WITH_THEN_BRANCH_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgAnd* const repalcementp = make<DfgAnd>(flp, dtypep);
|
|
|
|
|
DfgNot* const notp = make<DfgNot>(flp, dtypep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
notp->srcp(condp);
|
|
|
|
|
repalcementp->lhsp(notp);
|
|
|
|
|
repalcementp->rhsp(elsep);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, repalcementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (thenp->isOnes()) { // a ? 1 : b becomes a | b
|
|
|
|
|
APPLYING(REPLACE_COND_WITH_THEN_BRANCH_ONES) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgOr* const repalcementp = make<DfgOr>(flp, dtypep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
repalcementp->lhsp(condp);
|
|
|
|
|
repalcementp->rhsp(elsep);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, repalcementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (elsep->isZero()) { // a ? b : 0 becomes a & b
|
|
|
|
|
APPLYING(REPLACE_COND_WITH_ELSE_BRANCH_ZERO) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgAnd* const repalcementp = make<DfgAnd>(flp, dtypep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
repalcementp->lhsp(condp);
|
|
|
|
|
repalcementp->rhsp(thenp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, repalcementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (elsep->isOnes()) { // a ? b : 1 becomes ~a | b
|
|
|
|
|
APPLYING(REPLACE_COND_WITH_ELSE_BRANCH_ONES) {
|
2022-11-04 16:50:20 +01:00
|
|
|
DfgOr* const repalcementp = make<DfgOr>(flp, dtypep);
|
|
|
|
|
DfgNot* const notp = make<DfgNot>(flp, dtypep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
notp->srcp(condp);
|
|
|
|
|
repalcementp->lhsp(notp);
|
|
|
|
|
repalcementp->rhsp(thenp);
|
2022-11-04 16:50:20 +01:00
|
|
|
replace(vtxp, repalcementp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#undef APPLYING
|
|
|
|
|
|
|
|
|
|
V3DfgPeephole(DfgGraph& dfg, V3DfgPeepholeContext& ctx)
|
|
|
|
|
: m_dfg{dfg}
|
2022-10-06 13:02:46 +02:00
|
|
|
, m_ctx{ctx} {
|
|
|
|
|
|
2022-11-04 16:50:20 +01:00
|
|
|
// DfgVertex::user is the next pointer of the work list elements
|
|
|
|
|
const auto userDataInUse = m_dfg.userDataInUse();
|
|
|
|
|
|
|
|
|
|
// Add all vertices to the work list. This also allocates all DfgVertex::user.
|
|
|
|
|
for (DfgVertex *vtxp = m_dfg.opVerticesBeginp(), *nextp; vtxp; vtxp = nextp) {
|
|
|
|
|
nextp = vtxp->verticesNext();
|
|
|
|
|
if (VL_LIKELY(nextp)) VL_PREFETCH_RW(nextp);
|
|
|
|
|
vtxp->setUser<DfgVertex*>(m_workListp);
|
|
|
|
|
m_workListp = vtxp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Process the work list
|
|
|
|
|
while (m_workListp != reinterpret_cast<DfgVertex*>(this)) {
|
|
|
|
|
// Pick up the head
|
|
|
|
|
DfgVertex* const vtxp = m_workListp;
|
|
|
|
|
// Detach the head and prefetch next
|
|
|
|
|
m_workListp = vtxp->getUser<DfgVertex*>();
|
|
|
|
|
VL_PREFETCH_RW(m_workListp);
|
|
|
|
|
vtxp->setUser<DfgVertex*>(nullptr);
|
2022-11-06 16:42:01 +01:00
|
|
|
// Remove unused vertices as we gp
|
|
|
|
|
if (!vtxp->hasSinks()) {
|
|
|
|
|
deleteVertex(vtxp);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
// Transform node (might get deleted in the process)
|
|
|
|
|
iterate(vtxp);
|
2022-10-06 13:02:46 +02:00
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
public:
|
2022-10-06 13:02:46 +02:00
|
|
|
static void apply(DfgGraph& dfg, V3DfgPeepholeContext& ctx) { V3DfgPeephole{dfg, ctx}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
void V3DfgPasses::peephole(DfgGraph& dfg, V3DfgPeepholeContext& ctx) {
|
|
|
|
|
V3DfgPeephole::apply(dfg, ctx);
|
|
|
|
|
}
|