Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Convert AstModule to DfgGraph
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2025-01-01 14:30:25 +01:00
|
|
|
// Copyright 2003-2025 by Wilson Snyder. This program is free software; you
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2023-11-11 05:25:53 +01:00
|
|
|
// Convert and AstModule to a DfgGraph. We proceed by visiting convertible logic blocks (e.g.:
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// AstAssignW of appropriate type and with no delays), recursively constructing DfgVertex instances
|
|
|
|
|
// for the expressions that compose the subject logic block. If all expressions in the current
|
|
|
|
|
// logic block can be converted, then we delete the logic block (now represented in the DfgGraph),
|
|
|
|
|
// and connect the corresponding DfgVertex instances appropriately. If some of the expressions were
|
|
|
|
|
// not convertible in the current logic block, we revert (delete) the DfgVertex instances created
|
|
|
|
|
// for the logic block, and leave the logic block in the AstModule. Any variable reference from
|
|
|
|
|
// non-converted logic blocks (or other constructs under the AstModule) are marked as being
|
|
|
|
|
// referenced in the AstModule, which is relevant for later optimization.
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
2023-10-18 12:37:46 +02:00
|
|
|
#include "V3PchAstNoMT.h" // VL_MT_DISABLED_CODE_UNIT
|
|
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
#include "V3Const.h"
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
#include "V3DfgPasses.h"
|
|
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
#include <iterator>
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
|
|
2022-10-12 11:19:21 +02:00
|
|
|
// Create a DfgVertex out of a AstNodeExpr. For most AstNodeExpr subtypes, this can be done
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// automatically. For the few special cases, we provide specializations below
|
2024-11-30 02:20:38 +01:00
|
|
|
template <typename T_Vertex, typename T_Node>
|
|
|
|
|
T_Vertex* makeVertex(const T_Node* nodep, DfgGraph& dfg) {
|
|
|
|
|
return new T_Vertex{dfg, nodep->fileline(), DfgVertex::dtypeFor(nodep)};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
template <>
|
|
|
|
|
DfgArraySel* makeVertex<DfgArraySel, AstArraySel>(const AstArraySel* nodep, DfgGraph& dfg) {
|
|
|
|
|
// Some earlier passes create malformed ArraySels, just bail on those...
|
|
|
|
|
// See t_bitsel_wire_array_bad
|
|
|
|
|
if (VN_IS(nodep->fromp(), Const)) return nullptr;
|
|
|
|
|
AstUnpackArrayDType* const fromDtypep
|
|
|
|
|
= VN_CAST(nodep->fromp()->dtypep()->skipRefp(), UnpackArrayDType);
|
|
|
|
|
if (!fromDtypep) return nullptr;
|
|
|
|
|
return new DfgArraySel{dfg, nodep->fileline(), DfgVertex::dtypeFor(nodep)};
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//======================================================================
|
|
|
|
|
// Currently unhandled nodes
|
|
|
|
|
// LCOV_EXCL_START
|
|
|
|
|
// AstCCast changes width, but should not exists where DFG optimization is currently invoked
|
|
|
|
|
template <>
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgCCast* makeVertex<DfgCCast, AstCCast>(const AstCCast*, DfgGraph&) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
// Unhandled in DfgToAst, but also operates on strings which we don't optimize anyway
|
|
|
|
|
template <>
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgAtoN* makeVertex<DfgAtoN, AstAtoN>(const AstAtoN*, DfgGraph&) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
// Unhandled in DfgToAst, but also operates on strings which we don't optimize anyway
|
|
|
|
|
template <>
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgCompareNN* makeVertex<DfgCompareNN, AstCompareNN>(const AstCompareNN*, DfgGraph&) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
// Unhandled in DfgToAst, but also operates on unpacked arrays which we don't optimize anyway
|
|
|
|
|
template <>
|
2022-10-04 12:03:41 +02:00
|
|
|
DfgSliceSel* makeVertex<DfgSliceSel, AstSliceSel>(const AstSliceSel*, DfgGraph&) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
// LCOV_EXCL_STOP
|
|
|
|
|
|
|
|
|
|
} // namespace
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Visitor that can convert combinational Ast logic constructs/assignments to Dfg
|
2025-07-01 23:55:08 +02:00
|
|
|
template <bool T_Scoped>
|
2025-08-08 23:53:12 +02:00
|
|
|
class AstToDfgConverter final : public VNVisitor {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// NODE STATE
|
2025-08-08 23:53:12 +02:00
|
|
|
// AstNodeExpr/AstVar/AstVarScope::user2p -> DfgVertex* for this Node
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-11-12 15:14:32 +01:00
|
|
|
// TYPES
|
2025-08-08 23:53:12 +02:00
|
|
|
using Variable = std::conditional_t<T_Scoped, AstVarScope, AstVar>;
|
2025-07-01 23:55:08 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// STATE
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgGraph& m_dfg; // The graph being built
|
2025-07-26 21:37:01 +02:00
|
|
|
V3DfgAstToDfgContext& m_ctx; // The context for stats
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
bool m_foundUnhandled = false; // Found node not implemented as DFG or not implemented 'visit'
|
2022-09-26 15:21:05 +02:00
|
|
|
bool m_converting = false; // We are trying to convert some logic at the moment
|
2025-08-08 23:53:12 +02:00
|
|
|
std::vector<DfgVertexSplice*> m_uncommittedSpliceps; // New splices made during convertLValue
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// METHODS
|
2025-08-08 23:53:12 +02:00
|
|
|
static Variable* getTarget(const AstVarRef* refp) {
|
2025-07-01 23:55:08 +02:00
|
|
|
// TODO: remove the useless reinterpret_casts when C++17 'if constexpr' actually works
|
|
|
|
|
if VL_CONSTEXPR_CXX17 (T_Scoped) {
|
2025-08-08 23:53:12 +02:00
|
|
|
return reinterpret_cast<Variable*>(refp->varScopep());
|
2025-07-01 23:55:08 +02:00
|
|
|
} else {
|
2025-08-08 23:53:12 +02:00
|
|
|
return reinterpret_cast<Variable*>(refp->varp());
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgVertexVar* getNet(Variable* varp) {
|
|
|
|
|
if (!varp->user2p()) {
|
|
|
|
|
AstNodeDType* const dtypep = varp->dtypep()->skipRefp();
|
|
|
|
|
DfgVertexVar* const vtxp
|
|
|
|
|
= VN_IS(dtypep, UnpackArrayDType)
|
|
|
|
|
? static_cast<DfgVertexVar*>(new DfgVarArray{m_dfg, varp})
|
|
|
|
|
: static_cast<DfgVertexVar*>(new DfgVarPacked{m_dfg, varp});
|
|
|
|
|
varp->user2p(vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2025-08-08 23:53:12 +02:00
|
|
|
return varp->user2u().template to<DfgVertexVar*>();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Returns true if the expression cannot (or should not) be represented by DFG
|
2022-10-12 11:19:21 +02:00
|
|
|
bool unhandled(AstNodeExpr* nodep) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Short-circuiting if something was already unhandled
|
|
|
|
|
if (!m_foundUnhandled) {
|
|
|
|
|
// Impure nodes cannot be represented
|
|
|
|
|
if (!nodep->isPure()) {
|
|
|
|
|
m_foundUnhandled = true;
|
|
|
|
|
++m_ctx.m_nonRepImpure;
|
|
|
|
|
}
|
|
|
|
|
// Check node has supported dtype
|
|
|
|
|
if (!DfgVertex::isSupportedDType(nodep->dtypep())) {
|
|
|
|
|
m_foundUnhandled = true;
|
|
|
|
|
++m_ctx.m_nonRepDType;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return m_foundUnhandled;
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
bool isSupported(const AstVar* varp) {
|
|
|
|
|
if (varp->isIfaceRef()) return false; // Cannot handle interface references
|
|
|
|
|
if (varp->delayp()) return false; // Cannot handle delayed variables
|
|
|
|
|
if (varp->isSc()) return false; // SystemC variables are special and rare, we can ignore
|
|
|
|
|
return DfgVertex::isSupportedDType(varp->dtypep());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool isSupported(const AstVarScope* vscp) {
|
|
|
|
|
// Check the Var fist
|
|
|
|
|
if (!isSupported(vscp->varp())) return false;
|
|
|
|
|
// If the variable is not in a regular module, then do not convert it.
|
|
|
|
|
// This is especially needed for variabels in interfaces which might be
|
|
|
|
|
// referenced via virtual intefaces, which cannot be resovled statically.
|
|
|
|
|
if (!VN_IS(vscp->scopep()->modp(), Module)) return false;
|
|
|
|
|
// Otherwise OK
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool isSupported(const AstVarRef* nodep) {
|
|
|
|
|
// Cannot represent cross module references
|
|
|
|
|
if (nodep->classOrPackagep()) return false;
|
|
|
|
|
// Check target
|
|
|
|
|
return isSupported(getTarget(nodep));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Given an RValue expression, return the equivalent Vertex, or nullptr if not representable.
|
|
|
|
|
DfgVertex* convertRValue(AstNodeExpr* nodep) {
|
|
|
|
|
UASSERT_OBJ(!m_converting, nodep, "'convertingRValue' should not be called recursively");
|
|
|
|
|
VL_RESTORER(m_converting);
|
|
|
|
|
VL_RESTORER(m_foundUnhandled);
|
|
|
|
|
m_converting = true;
|
|
|
|
|
m_foundUnhandled = false;
|
2025-07-21 18:33:12 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Convert the expression
|
|
|
|
|
iterate(nodep);
|
|
|
|
|
|
|
|
|
|
// If falied to convert, return nullptr
|
|
|
|
|
if (m_foundUnhandled) return nullptr;
|
|
|
|
|
|
|
|
|
|
// Traversal set user2p to the equivalent vertex
|
|
|
|
|
DfgVertex* const vtxp = nodep->user2u().to<DfgVertex*>();
|
|
|
|
|
UASSERT_OBJ(vtxp, nodep, "Missing Dfg vertex after covnersion");
|
|
|
|
|
return vtxp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Given an LValue expression, return the splice node that writes the
|
|
|
|
|
// destination, together with the index to use for splicing in the value.
|
|
|
|
|
// Returns {nullptr, 0}, if the given LValue expression is not supported.
|
|
|
|
|
std::pair<DfgVertexSplice*, uint32_t> convertLValue(AstNodeExpr* nodep) {
|
|
|
|
|
if (AstVarRef* const vrefp = VN_CAST(nodep, VarRef)) {
|
|
|
|
|
if (!isSupported(vrefp)) {
|
|
|
|
|
++m_ctx.m_nonRepLhs;
|
|
|
|
|
return {nullptr, 0};
|
|
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
// Get the variable vertex
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgVertexVar* const vtxp = getNet(getTarget(vrefp));
|
2025-07-21 18:33:12 +02:00
|
|
|
// Ensure the Splice driver exists for this variable
|
2025-07-14 23:09:34 +02:00
|
|
|
if (!vtxp->srcp()) {
|
2025-07-21 18:33:12 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
AstNodeDType* const dtypep = vtxp->dtypep();
|
|
|
|
|
if (vtxp->is<DfgVarPacked>()) {
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSplicePacked* const newp = new DfgSplicePacked{m_dfg, flp, dtypep};
|
|
|
|
|
m_uncommittedSpliceps.emplace_back(newp);
|
|
|
|
|
vtxp->srcp(newp);
|
2025-07-21 18:33:12 +02:00
|
|
|
} else if (vtxp->is<DfgVarArray>()) {
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSpliceArray* const newp = new DfgSpliceArray{m_dfg, flp, dtypep};
|
|
|
|
|
m_uncommittedSpliceps.emplace_back(newp);
|
|
|
|
|
vtxp->srcp(newp);
|
2025-07-14 23:09:34 +02:00
|
|
|
} else {
|
2025-07-21 18:33:12 +02:00
|
|
|
nodep->v3fatalSrc("Unhandled DfgVertexVar sub-type"); // LCOV_EXCL_LINE
|
2025-07-14 23:09:34 +02:00
|
|
|
}
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
// Return the Splice driver
|
|
|
|
|
return {vtxp->srcp()->as<DfgVertexSplice>(), 0};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (AstSel* selp = VN_CAST(nodep, Sel)) {
|
|
|
|
|
// Only handle constant selects
|
|
|
|
|
const AstConst* const lsbp = VN_CAST(selp->lsbp(), Const);
|
|
|
|
|
if (!lsbp) {
|
|
|
|
|
++m_ctx.m_nonRepLhs;
|
|
|
|
|
return {nullptr, 0};
|
|
|
|
|
}
|
|
|
|
|
uint32_t lsb = lsbp->toUInt();
|
|
|
|
|
|
|
|
|
|
// Convert the 'fromp' sub-expression
|
|
|
|
|
const auto pair = convertLValue(selp->fromp());
|
|
|
|
|
if (!pair.first) return {nullptr, 0};
|
|
|
|
|
DfgSplicePacked* const splicep = pair.first->template as<DfgSplicePacked>();
|
|
|
|
|
// Adjust index.
|
|
|
|
|
lsb += pair.second;
|
|
|
|
|
|
|
|
|
|
// AstSel doesn't change type kind (array vs packed), so we can use
|
|
|
|
|
// the existing splice driver with adjusted lsb
|
|
|
|
|
return {splicep, lsb};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (AstArraySel* const aselp = VN_CAST(nodep, ArraySel)) {
|
|
|
|
|
// Only handle constant selects
|
|
|
|
|
const AstConst* const indexp = VN_CAST(aselp->bitp(), Const);
|
|
|
|
|
if (!indexp) {
|
|
|
|
|
++m_ctx.m_nonRepLhs;
|
|
|
|
|
return {nullptr, 0};
|
|
|
|
|
}
|
|
|
|
|
uint32_t index = indexp->toUInt();
|
|
|
|
|
|
|
|
|
|
// Convert the 'fromp' sub-expression
|
|
|
|
|
const auto pair = convertLValue(aselp->fromp());
|
|
|
|
|
if (!pair.first) return {nullptr, 0};
|
|
|
|
|
DfgSpliceArray* const splicep = pair.first->template as<DfgSpliceArray>();
|
|
|
|
|
// Adjust index. Note pair.second is always 0, but we might handle array slices later..
|
|
|
|
|
index += pair.second;
|
|
|
|
|
|
|
|
|
|
// Ensure the Splice driver exists for this element
|
|
|
|
|
if (!splicep->driverAt(index)) {
|
|
|
|
|
FileLine* const flp = nodep->fileline();
|
|
|
|
|
AstNodeDType* const dtypep = DfgVertex::dtypeFor(nodep);
|
|
|
|
|
if (VN_IS(dtypep, BasicDType)) {
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSplicePacked* const newp = new DfgSplicePacked{m_dfg, flp, dtypep};
|
|
|
|
|
m_uncommittedSpliceps.emplace_back(newp);
|
|
|
|
|
splicep->addDriver(flp, index, newp);
|
2025-07-21 18:33:12 +02:00
|
|
|
} else if (VN_IS(dtypep, UnpackArrayDType)) {
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSpliceArray* const newp = new DfgSpliceArray{m_dfg, flp, dtypep};
|
|
|
|
|
m_uncommittedSpliceps.emplace_back(newp);
|
|
|
|
|
splicep->addDriver(flp, index, newp);
|
2025-07-21 18:33:12 +02:00
|
|
|
} else {
|
|
|
|
|
nodep->v3fatalSrc("Unhandled AstNodeDType sub-type"); // LCOV_EXCL_LINE
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Return the splice driver
|
|
|
|
|
return {splicep->driverAt(index)->as<DfgVertexSplice>(), 0};
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
2025-07-14 23:09:34 +02:00
|
|
|
|
|
|
|
|
++m_ctx.m_nonRepLhs;
|
2025-07-21 18:33:12 +02:00
|
|
|
return {nullptr, 0};
|
2025-07-14 23:09:34 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Given the LHS of an assignment, and the vertex representing the RHS,
|
|
|
|
|
// connect up the RHS to drive the targets.
|
|
|
|
|
// Returns true on success, false if the LHS is not representable.
|
|
|
|
|
bool convertAssignment(FileLine* flp, AstNodeExpr* lhsp, DfgVertex* vtxp) {
|
2025-08-03 15:52:20 +02:00
|
|
|
// Represents a DFG assignment contributed by the AST assignment with the above 'lhsp'.
|
|
|
|
|
// There might be multiple of these if 'lhsp' is a concatenation.
|
|
|
|
|
struct Assignment final {
|
|
|
|
|
DfgVertexSplice* m_lhsp;
|
|
|
|
|
uint32_t m_idx;
|
|
|
|
|
DfgVertex* m_rhsp;
|
|
|
|
|
Assignment() = delete;
|
|
|
|
|
Assignment(DfgVertexSplice* lhsp, uint32_t idx, DfgVertex* rhsp)
|
|
|
|
|
: m_lhsp{lhsp}
|
|
|
|
|
, m_idx{idx}
|
|
|
|
|
, m_rhsp{rhsp} {}
|
|
|
|
|
};
|
2022-09-25 17:03:15 +02:00
|
|
|
|
2025-08-03 15:52:20 +02:00
|
|
|
// Convert each concatenation LHS separately, gather all assignments
|
|
|
|
|
// we need to do into 'assignments', return true if all LValues
|
|
|
|
|
// converted successfully.
|
|
|
|
|
std::vector<Assignment> assignments;
|
2025-08-08 23:53:12 +02:00
|
|
|
const std::function<bool(AstNodeExpr*, DfgVertex*)> convertAllLValues
|
|
|
|
|
= [&](AstNodeExpr* lhsp, DfgVertex* vtxp) -> bool {
|
2025-08-03 15:52:20 +02:00
|
|
|
// Simplify the LHS, to get rid of things like SEL(CONCAT(_, _), _)
|
2025-08-08 23:53:12 +02:00
|
|
|
lhsp = VN_AS(V3Const::constifyExpensiveEdit(lhsp), NodeExpr);
|
2025-08-03 15:52:20 +02:00
|
|
|
|
|
|
|
|
// Concatenation on the LHS, convert each parts
|
|
|
|
|
if (AstConcat* const concatp = VN_CAST(lhsp, Concat)) {
|
2025-08-08 23:53:12 +02:00
|
|
|
AstNodeExpr* const cLhsp = concatp->lhsp();
|
|
|
|
|
AstNodeExpr* const cRhsp = concatp->rhsp();
|
2025-08-03 15:52:20 +02:00
|
|
|
// Convert Left of concat
|
2025-07-21 18:33:12 +02:00
|
|
|
FileLine* const lFlp = cLhsp->fileline();
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSel* const lVtxp = new DfgSel{m_dfg, lFlp, DfgVertex::dtypeFor(cLhsp)};
|
2022-09-25 17:03:15 +02:00
|
|
|
lVtxp->fromp(vtxp);
|
2025-07-21 18:33:12 +02:00
|
|
|
lVtxp->lsb(cRhsp->width());
|
2025-08-03 15:52:20 +02:00
|
|
|
if (!convertAllLValues(cLhsp, lVtxp)) return false;
|
|
|
|
|
// Convert Rigth of concat
|
2025-07-21 18:33:12 +02:00
|
|
|
FileLine* const rFlp = cRhsp->fileline();
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSel* const rVtxp = new DfgSel{m_dfg, rFlp, DfgVertex::dtypeFor(cRhsp)};
|
2022-09-25 17:03:15 +02:00
|
|
|
rVtxp->fromp(vtxp);
|
2022-10-06 19:34:18 +02:00
|
|
|
rVtxp->lsb(0);
|
2025-08-03 15:52:20 +02:00
|
|
|
return convertAllLValues(cRhsp, rVtxp);
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
|
2025-08-03 15:52:20 +02:00
|
|
|
// Non-concatenation, convert the LValue
|
|
|
|
|
const auto pair = convertLValue(lhsp);
|
|
|
|
|
if (!pair.first) return false;
|
|
|
|
|
assignments.emplace_back(pair.first, pair.second, vtxp);
|
|
|
|
|
return true;
|
|
|
|
|
};
|
|
|
|
|
// Convert the given LHS assignment, give up if any LValues failed to convert
|
2025-08-08 23:53:12 +02:00
|
|
|
if (!convertAllLValues(lhsp, vtxp)) {
|
|
|
|
|
for (DfgVertexSplice* const splicep : m_uncommittedSpliceps) {
|
|
|
|
|
VL_DO_DANGLING(splicep->unlinkDelete(m_dfg), splicep);
|
|
|
|
|
}
|
|
|
|
|
m_uncommittedSpliceps.clear();
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
m_uncommittedSpliceps.clear();
|
2025-08-03 15:52:20 +02:00
|
|
|
|
|
|
|
|
// All successful, connect the drivers
|
|
|
|
|
for (const Assignment& a : assignments) {
|
|
|
|
|
if (DfgSplicePacked* const spp = a.m_lhsp->template cast<DfgSplicePacked>()) {
|
|
|
|
|
spp->addDriver(flp, a.m_idx, a.m_rhsp);
|
|
|
|
|
} else if (DfgSpliceArray* const sap = a.m_lhsp->template cast<DfgSpliceArray>()) {
|
|
|
|
|
sap->addDriver(flp, a.m_idx, a.m_rhsp);
|
|
|
|
|
} else {
|
|
|
|
|
a.m_lhsp->v3fatalSrc("Unhandled DfgVertexSplice sub-type"); // LCOV_EXCL_LINE
|
|
|
|
|
}
|
2025-07-14 23:09:34 +02:00
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
return true;
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Convert the assignment with the given LHS and RHS into DFG.
|
|
|
|
|
// Returns true on success, false if not representable.
|
|
|
|
|
bool convertEquation(FileLine* flp, AstNodeExpr* lhsp, AstNodeExpr* rhsp) {
|
2025-07-21 18:33:12 +02:00
|
|
|
// Check data types are compatible.
|
|
|
|
|
if (!DfgVertex::isSupportedDType(lhsp->dtypep())
|
|
|
|
|
|| !DfgVertex::isSupportedDType(rhsp->dtypep())) {
|
|
|
|
|
++m_ctx.m_nonRepDType;
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// For now, only direct array assignment is supported (e.g. a = b, but not a = _ ? b : c)
|
|
|
|
|
if (VN_IS(rhsp->dtypep()->skipRefp(), UnpackArrayDType) && !VN_IS(rhsp, VarRef)) {
|
2022-09-27 01:06:50 +02:00
|
|
|
++m_ctx.m_nonRepDType;
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
// Cannot handle mismatched widths. Mismatched assignments should have been fixed up in
|
|
|
|
|
// earlier passes anyway, so this should never be hit, but being paranoid just in case.
|
|
|
|
|
if (lhsp->width() != rhsp->width()) { // LCOV_EXCL_START
|
|
|
|
|
++m_ctx.m_nonRepWidth;
|
|
|
|
|
return false;
|
|
|
|
|
} // LCOV_EXCL_STOP
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Convert the RHS expression
|
|
|
|
|
DfgVertex* const rVtxp = convertRValue(rhsp);
|
|
|
|
|
if (!rVtxp) return false;
|
2022-09-25 17:03:15 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Connect the RHS vertex to the LHS targets
|
|
|
|
|
if (!convertAssignment(flp, lhsp, rVtxp)) return false;
|
|
|
|
|
|
|
|
|
|
// All good
|
|
|
|
|
++m_ctx.m_representable;
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Convert an AstNodeAssign (AstAssign or AstAssignW)
|
|
|
|
|
bool convertNodeAssign(AstNodeAssign* nodep) {
|
|
|
|
|
UASSERT_OBJ(VN_IS(nodep, AssignW) || VN_IS(nodep, Assign), nodep, "Invalid subtype");
|
|
|
|
|
++m_ctx.m_inputEquations;
|
|
|
|
|
|
|
|
|
|
// Cannot handle assignment with timing control yet
|
|
|
|
|
if (nodep->timingControlp()) {
|
|
|
|
|
++m_ctx.m_nonRepTiming;
|
2022-09-25 17:03:15 +02:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
return convertEquation(nodep->fileline(), nodep->lhsp(), nodep->rhsp());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Convert special simple form Always block into DFG.
|
|
|
|
|
// Returns true on success, false if not representable/not simple.
|
|
|
|
|
bool convertSimpleAlways(AstAlways* nodep) {
|
|
|
|
|
// Only consider single statement block
|
|
|
|
|
if (!nodep->isJustOneBodyStmt()) return false;
|
|
|
|
|
|
|
|
|
|
AstNode* const stmtp = nodep->stmtsp();
|
|
|
|
|
|
|
|
|
|
if (AstAssign* const assignp = VN_CAST(stmtp, Assign)) {
|
|
|
|
|
return convertNodeAssign(assignp);
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
if (AstIf* const ifp = VN_CAST(stmtp, If)) {
|
|
|
|
|
// Will only handle single assignments to the same LHS in both branches
|
|
|
|
|
AstAssign* const thenp = VN_CAST(ifp->thensp(), Assign);
|
|
|
|
|
AstAssign* const elsep = VN_CAST(ifp->elsesp(), Assign);
|
|
|
|
|
if (!thenp || !elsep || thenp->nextp() || elsep->nextp()
|
|
|
|
|
|| !thenp->lhsp()->sameTree(elsep->lhsp())) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
++m_ctx.m_inputEquations;
|
|
|
|
|
if (thenp->timingControlp() || elsep->timingControlp()) {
|
|
|
|
|
++m_ctx.m_nonRepTiming;
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2022-09-25 17:03:15 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Create a conditional for the rhs by borrowing the components from the AstIf
|
|
|
|
|
AstCond* const rhsp = new AstCond{ifp->fileline(), //
|
|
|
|
|
ifp->condp()->unlinkFrBack(), //
|
|
|
|
|
thenp->rhsp()->unlinkFrBack(), //
|
|
|
|
|
elsep->rhsp()->unlinkFrBack()};
|
|
|
|
|
const bool success = convertEquation(ifp->fileline(), thenp->lhsp(), rhsp);
|
|
|
|
|
// Put the AstIf back together
|
|
|
|
|
ifp->condp(rhsp->condp()->unlinkFrBack());
|
|
|
|
|
thenp->rhsp(rhsp->thenp()->unlinkFrBack());
|
|
|
|
|
elsep->rhsp(rhsp->elsep()->unlinkFrBack());
|
|
|
|
|
// Delete the auxiliary conditional
|
|
|
|
|
VL_DO_DANGLING(rhsp->deleteTree(), rhsp);
|
|
|
|
|
return success;
|
|
|
|
|
}
|
2022-09-25 17:03:15 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
return false;
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// VISITORS
|
|
|
|
|
|
|
|
|
|
// Unhandled node
|
|
|
|
|
void visit(AstNode* nodep) override {
|
|
|
|
|
if (!m_foundUnhandled && m_converting) ++m_ctx.m_nonRepUnknown;
|
|
|
|
|
m_foundUnhandled = true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Expressions - mostly auto generated, but a few special ones
|
|
|
|
|
void visit(AstVarRef* nodep) override {
|
|
|
|
|
UASSERT_OBJ(m_converting, nodep, "AstToDfg visit called without m_converting");
|
|
|
|
|
UASSERT_OBJ(!nodep->user2p(), nodep, "Already has Dfg vertex");
|
|
|
|
|
if (unhandled(nodep)) return;
|
|
|
|
|
// This visit method is only called on RValues, where only read refs are supportes
|
|
|
|
|
if (!nodep->access().isReadOnly() || !isSupported(nodep)) {
|
|
|
|
|
m_foundUnhandled = true;
|
|
|
|
|
++m_ctx.m_nonRepVarRef;
|
|
|
|
|
return;
|
2022-11-12 15:14:32 +01:00
|
|
|
}
|
2025-08-08 23:53:12 +02:00
|
|
|
nodep->user2p(getNet(getTarget(nodep)));
|
|
|
|
|
}
|
|
|
|
|
void visit(AstConst* nodep) override {
|
|
|
|
|
UASSERT_OBJ(m_converting, nodep, "AstToDfg visit called without m_converting");
|
|
|
|
|
UASSERT_OBJ(!nodep->user2p(), nodep, "Already has Dfg vertex");
|
|
|
|
|
if (unhandled(nodep)) return;
|
|
|
|
|
DfgVertex* const vtxp = new DfgConst{m_dfg, nodep->fileline(), nodep->num()};
|
|
|
|
|
nodep->user2p(vtxp);
|
2022-11-12 15:14:32 +01:00
|
|
|
}
|
2025-08-08 23:53:12 +02:00
|
|
|
void visit(AstSel* nodep) override {
|
|
|
|
|
UASSERT_OBJ(m_converting, nodep, "AstToDfg visit called without m_converting");
|
|
|
|
|
UASSERT_OBJ(!nodep->user2p(), nodep, "Already has Dfg vertex");
|
|
|
|
|
if (unhandled(nodep)) return;
|
2022-11-12 15:14:32 +01:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
iterate(nodep->fromp());
|
|
|
|
|
if (m_foundUnhandled) return;
|
|
|
|
|
|
|
|
|
|
FileLine* const flp = nodep->fileline();
|
|
|
|
|
DfgVertex* vtxp = nullptr;
|
|
|
|
|
if (AstConst* const constp = VN_CAST(nodep->lsbp(), Const)) {
|
|
|
|
|
DfgSel* const selp = new DfgSel{m_dfg, flp, DfgVertex::dtypeFor(nodep)};
|
|
|
|
|
selp->fromp(nodep->fromp()->user2u().to<DfgVertex*>());
|
|
|
|
|
selp->lsb(constp->toUInt());
|
|
|
|
|
vtxp = selp;
|
|
|
|
|
} else {
|
|
|
|
|
iterate(nodep->lsbp());
|
|
|
|
|
if (m_foundUnhandled) return;
|
|
|
|
|
DfgMux* const muxp = new DfgMux{m_dfg, flp, DfgVertex::dtypeFor(nodep)};
|
|
|
|
|
muxp->fromp(nodep->fromp()->user2u().to<DfgVertex*>());
|
|
|
|
|
muxp->lsbp(nodep->lsbp()->user2u().to<DfgVertex*>());
|
|
|
|
|
vtxp = muxp;
|
|
|
|
|
}
|
|
|
|
|
nodep->user2p(vtxp);
|
|
|
|
|
}
|
|
|
|
|
// The rest of the visit methods for expressions are generated by 'astgen'
|
|
|
|
|
#include "V3Dfg__gen_ast_to_dfg.h"
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
// PUBLIC METHODS
|
|
|
|
|
|
|
|
|
|
// Convert AstAssignW to Dfg, return true if successful.
|
|
|
|
|
bool convert(AstAssignW* nodep) {
|
|
|
|
|
if (convertNodeAssign(nodep)) {
|
|
|
|
|
// Remove node from Ast. Now represented by the Dfg.
|
|
|
|
|
VL_DO_DANGLING(nodep->unlinkFrBack()->deleteTree(), nodep);
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Convert AstAlways to Dfg, return true if successful.
|
|
|
|
|
bool convert(AstAlways* nodep) {
|
|
|
|
|
// Ignore sequential logic
|
|
|
|
|
const VAlwaysKwd kwd = nodep->keyword();
|
|
|
|
|
if (nodep->sensesp() || (kwd != VAlwaysKwd::ALWAYS && kwd != VAlwaysKwd::ALWAYS_COMB)) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Attemp to convert special forms
|
|
|
|
|
if (convertSimpleAlways(nodep)) {
|
|
|
|
|
// Remove node from Ast. Now represented by the Dfg.
|
|
|
|
|
VL_DO_DANGLING(nodep->unlinkFrBack()->deleteTree(), nodep);
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
AstToDfgConverter(DfgGraph& dfg, V3DfgAstToDfgContext& ctx)
|
|
|
|
|
: m_dfg{dfg}
|
|
|
|
|
, m_ctx{ctx} {}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Resolves multiple drivers (keep only the first one),
|
|
|
|
|
// and ensures drivers are stored in ascending index order
|
|
|
|
|
class AstToDfgNormalizeDrivers final {
|
|
|
|
|
// TYPES
|
|
|
|
|
struct Driver final {
|
|
|
|
|
FileLine* m_flp; // Location of driver in source
|
|
|
|
|
uint32_t m_low; // Low index of driven range
|
|
|
|
|
DfgVertex* m_vtxp; // Driving vertex
|
|
|
|
|
Driver() = delete;
|
|
|
|
|
Driver(FileLine* flp, uint32_t low, DfgVertex* vtxp)
|
|
|
|
|
: m_flp{flp}
|
|
|
|
|
, m_low{low}
|
|
|
|
|
, m_vtxp{vtxp} {}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// STATE
|
|
|
|
|
DfgGraph& m_dfg; // The graph being processed
|
|
|
|
|
DfgVertexVar& m_var; // The variable being normalzied
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
|
|
|
|
|
// Normalize packed driver
|
|
|
|
|
void normalizePacked(const std::string& sub, DfgSplicePacked* const splicep) {
|
|
|
|
|
UASSERT_OBJ(splicep->arity() >= 1, splicep, "Undriven DfgSplicePacked");
|
2025-07-21 18:33:12 +02:00
|
|
|
|
|
|
|
|
// The drivers of 'splicep'
|
|
|
|
|
std::vector<Driver> drivers;
|
|
|
|
|
drivers.reserve(splicep->arity());
|
|
|
|
|
|
|
|
|
|
// Sometime assignment ranges are coalesced by V3Const,
|
|
|
|
|
// so we unpack concatenations for better error reporting.
|
|
|
|
|
const std::function<void(FileLine*, uint32_t, DfgVertex*)> gather
|
|
|
|
|
= [&](FileLine* flp, uint32_t lsb, DfgVertex* vtxp) -> void {
|
|
|
|
|
if (DfgConcat* const concatp = vtxp->cast<DfgConcat>()) {
|
|
|
|
|
DfgVertex* const rhsp = concatp->rhsp();
|
|
|
|
|
auto const rhs_width = rhsp->width();
|
|
|
|
|
gather(rhsp->fileline(), lsb, rhsp);
|
|
|
|
|
DfgVertex* const lhsp = concatp->lhsp();
|
|
|
|
|
gather(lhsp->fileline(), lsb + rhs_width, lhsp);
|
2025-08-08 23:53:12 +02:00
|
|
|
concatp->unlinkDelete(m_dfg);
|
2025-07-21 18:33:12 +02:00
|
|
|
} else {
|
|
|
|
|
drivers.emplace_back(flp, lsb, vtxp);
|
2022-11-12 15:14:32 +01:00
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Gather and unlink all drivers
|
2025-08-08 23:53:12 +02:00
|
|
|
splicep->forEachSourceEdge([&](DfgEdge& edge, size_t i) {
|
2025-07-21 18:33:12 +02:00
|
|
|
DfgVertex* const driverp = edge.sourcep();
|
|
|
|
|
UASSERT(driverp, "Should not have created undriven sources");
|
|
|
|
|
UASSERT_OBJ(!driverp->is<DfgVertexSplice>(), splicep, "Should not be DfgVertexSplice");
|
2025-08-08 23:53:12 +02:00
|
|
|
gather(splicep->driverFileLine(i), splicep->driverLsb(i), driverp);
|
2025-07-21 18:33:12 +02:00
|
|
|
edge.unlinkSource();
|
|
|
|
|
});
|
2025-08-08 23:53:12 +02:00
|
|
|
splicep->resetSources();
|
2022-09-27 01:06:50 +02:00
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
const auto cmp = [](const Driver& a, const Driver& b) {
|
2025-08-08 23:53:12 +02:00
|
|
|
if (a.m_low != b.m_low) return a.m_low < b.m_low;
|
|
|
|
|
return a.m_flp->operatorCompare(*b.m_flp) < 0;
|
2025-07-21 18:33:12 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Sort drivers by LSB
|
|
|
|
|
std::stable_sort(drivers.begin(), drivers.end(), cmp);
|
|
|
|
|
|
|
|
|
|
// Fix multiply driven ranges
|
|
|
|
|
for (auto it = drivers.begin(); it != drivers.end();) {
|
|
|
|
|
Driver& a = *it++;
|
|
|
|
|
const uint32_t aWidth = a.m_vtxp->width();
|
2025-08-08 23:53:12 +02:00
|
|
|
const uint32_t aEnd = a.m_low + aWidth;
|
2025-07-21 18:33:12 +02:00
|
|
|
while (it != drivers.end()) {
|
|
|
|
|
Driver& b = *it;
|
|
|
|
|
// If no overlap, then nothing to do
|
2025-08-08 23:53:12 +02:00
|
|
|
if (b.m_low >= aEnd) break;
|
2022-09-27 01:06:50 +02:00
|
|
|
|
2022-11-12 15:14:32 +01:00
|
|
|
const uint32_t bWidth = b.m_vtxp->width();
|
2025-08-08 23:53:12 +02:00
|
|
|
const uint32_t bEnd = b.m_low + bWidth;
|
2025-07-21 18:33:12 +02:00
|
|
|
const uint32_t overlapEnd = std::min(aEnd, bEnd) - 1;
|
|
|
|
|
|
2025-08-03 15:52:20 +02:00
|
|
|
// Loop index often abused, so suppress
|
2025-08-08 23:53:12 +02:00
|
|
|
if (!m_var.varp()->isUsedLoopIdx()) {
|
|
|
|
|
AstNode* const nodep = m_var.nodep();
|
|
|
|
|
nodep->v3warn( //
|
2025-07-21 18:33:12 +02:00
|
|
|
MULTIDRIVEN,
|
|
|
|
|
"Bits [" //
|
2025-08-08 23:53:12 +02:00
|
|
|
<< overlapEnd << ":" << b.m_low << "] of signal '"
|
|
|
|
|
<< nodep->prettyName() << sub
|
|
|
|
|
<< "' have multiple combinational drivers\n"
|
|
|
|
|
<< a.m_flp->warnOther() << "... Location of first driver\n"
|
|
|
|
|
<< a.m_flp->warnContextPrimary() << '\n'
|
|
|
|
|
<< b.m_flp->warnOther() << "... Location of other driver\n"
|
|
|
|
|
<< b.m_flp->warnContextSecondary() << nodep->warnOther()
|
2025-07-21 18:33:12 +02:00
|
|
|
<< "... Only the first driver will be respected");
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
// If the first driver completely covers the range of the second driver,
|
|
|
|
|
// we can just delete the second driver completely, otherwise adjust the
|
|
|
|
|
// second driver to apply from the end of the range of the first driver.
|
|
|
|
|
if (aEnd >= bEnd) {
|
|
|
|
|
it = drivers.erase(it);
|
|
|
|
|
} else {
|
|
|
|
|
const auto dtypep = DfgVertex::dtypeForWidth(bEnd - aEnd);
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgSel* const selp = new DfgSel{m_dfg, b.m_vtxp->fileline(), dtypep};
|
2025-07-21 18:33:12 +02:00
|
|
|
selp->fromp(b.m_vtxp);
|
2025-08-08 23:53:12 +02:00
|
|
|
selp->lsb(aEnd - b.m_low);
|
|
|
|
|
b.m_low = aEnd;
|
2025-07-21 18:33:12 +02:00
|
|
|
b.m_vtxp = selp;
|
|
|
|
|
std::stable_sort(it, drivers.end(), cmp);
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
|
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
}
|
2022-09-27 01:06:50 +02:00
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
// Reinsert drivers in order
|
2025-08-08 23:53:12 +02:00
|
|
|
for (const Driver& d : drivers) splicep->addDriver(d.m_flp, d.m_low, d.m_vtxp);
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Normalize array driver
|
|
|
|
|
void normalizeArray(const std::string& sub, DfgSpliceArray* const splicep) {
|
|
|
|
|
UASSERT_OBJ(splicep->arity() >= 1, splicep, "Undriven DfgSpliceArray");
|
2025-07-21 18:33:12 +02:00
|
|
|
|
|
|
|
|
// The drivers of 'splicep'
|
|
|
|
|
std::vector<Driver> drivers;
|
|
|
|
|
drivers.reserve(splicep->arity());
|
|
|
|
|
|
|
|
|
|
// Normalize, gather, and unlink all drivers
|
|
|
|
|
splicep->forEachSourceEdge([&](DfgEdge& edge, size_t i) {
|
|
|
|
|
DfgVertex* const driverp = edge.sourcep();
|
|
|
|
|
UASSERT(driverp, "Should not have created undriven sources");
|
|
|
|
|
const uint32_t idx = splicep->driverIndex(i);
|
2025-08-08 23:53:12 +02:00
|
|
|
// Normalize
|
|
|
|
|
if (DfgSplicePacked* const splicePackedp = driverp->cast<DfgSplicePacked>()) {
|
|
|
|
|
normalizePacked(sub + "[" + std::to_string(idx) + "]", splicePackedp);
|
|
|
|
|
} else if (DfgSpliceArray* const spliceArrayp = driverp->cast<DfgSpliceArray>()) {
|
|
|
|
|
normalizeArray(sub + "[" + std::to_string(idx) + "]", spliceArrayp);
|
2025-07-21 18:33:12 +02:00
|
|
|
} else if (driverp->is<DfgVertexSplice>()) {
|
2025-08-08 23:53:12 +02:00
|
|
|
driverp->v3fatalSrc("Unhandled DfgVertexSplice sub-type"); // LCOV_EXCL_LINE
|
2025-07-21 18:33:12 +02:00
|
|
|
}
|
2025-08-08 23:53:12 +02:00
|
|
|
// Gather
|
|
|
|
|
drivers.emplace_back(splicep->driverFileLine(i), idx, driverp);
|
|
|
|
|
// Unlink
|
2025-07-21 18:33:12 +02:00
|
|
|
edge.unlinkSource();
|
|
|
|
|
});
|
2025-08-08 23:53:12 +02:00
|
|
|
splicep->resetSources();
|
2025-07-21 18:33:12 +02:00
|
|
|
|
|
|
|
|
const auto cmp = [](const Driver& a, const Driver& b) {
|
2025-08-08 23:53:12 +02:00
|
|
|
if (a.m_low != b.m_low) return a.m_low < b.m_low;
|
|
|
|
|
return a.m_flp->operatorCompare(*b.m_flp) < 0;
|
2025-07-21 18:33:12 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Sort drivers by index
|
|
|
|
|
std::stable_sort(drivers.begin(), drivers.end(), cmp);
|
|
|
|
|
|
|
|
|
|
// Fix multiply driven ranges
|
|
|
|
|
for (auto it = drivers.begin(); it != drivers.end();) {
|
|
|
|
|
Driver& a = *it++;
|
|
|
|
|
AstUnpackArrayDType* aArrayDTypep = VN_CAST(a.m_vtxp->dtypep(), UnpackArrayDType);
|
|
|
|
|
const uint32_t aElements = aArrayDTypep ? aArrayDTypep->elementsConst() : 1;
|
2025-08-08 23:53:12 +02:00
|
|
|
const uint32_t aEnd = a.m_low + aElements;
|
2025-07-21 18:33:12 +02:00
|
|
|
while (it != drivers.end()) {
|
|
|
|
|
Driver& b = *it;
|
|
|
|
|
// If no overlap, then nothing to do
|
2025-08-08 23:53:12 +02:00
|
|
|
if (b.m_low >= aEnd) break;
|
2025-07-21 18:33:12 +02:00
|
|
|
|
|
|
|
|
AstUnpackArrayDType* bArrayDTypep = VN_CAST(b.m_vtxp->dtypep(), UnpackArrayDType);
|
|
|
|
|
const uint32_t bElements = bArrayDTypep ? bArrayDTypep->elementsConst() : 1;
|
2025-08-08 23:53:12 +02:00
|
|
|
const uint32_t bEnd = b.m_low + bElements;
|
2025-07-21 18:33:12 +02:00
|
|
|
const uint32_t overlapEnd = std::min(aEnd, bEnd) - 1;
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
AstNode* const nodep = m_var.nodep();
|
|
|
|
|
nodep->v3warn( //
|
2025-08-03 15:52:20 +02:00
|
|
|
MULTIDRIVEN,
|
|
|
|
|
"Elements [" //
|
2025-08-08 23:53:12 +02:00
|
|
|
<< overlapEnd << ":" << b.m_low << "] of signal '" << nodep->prettyName()
|
2025-08-03 15:52:20 +02:00
|
|
|
<< sub << "' have multiple combinational drivers\n"
|
2025-08-08 23:53:12 +02:00
|
|
|
<< a.m_flp->warnOther() << "... Location of first driver\n"
|
|
|
|
|
<< a.m_flp->warnContextPrimary() << '\n'
|
|
|
|
|
<< b.m_flp->warnOther() << "... Location of other driver\n"
|
|
|
|
|
<< b.m_flp->warnContextSecondary() << nodep->warnOther()
|
2025-08-03 15:52:20 +02:00
|
|
|
<< "... Only the first driver will be respected");
|
2025-07-21 18:33:12 +02:00
|
|
|
|
|
|
|
|
// If the first driver completely covers the range of the second driver,
|
|
|
|
|
// we can just delete the second driver completely, otherwise adjust the
|
|
|
|
|
// second driver to apply from the end of the range of the first driver.
|
|
|
|
|
if (aEnd >= bEnd) {
|
|
|
|
|
it = drivers.erase(it);
|
|
|
|
|
} else {
|
|
|
|
|
const auto distance = std::distance(drivers.begin(), it);
|
|
|
|
|
DfgVertex* const bVtxp = b.m_vtxp;
|
|
|
|
|
FileLine* const flp = b.m_vtxp->fileline();
|
|
|
|
|
AstNodeDType* const elemDtypep = DfgVertex::dtypeFor(
|
|
|
|
|
VN_AS(splicep->dtypep(), UnpackArrayDType)->subDTypep());
|
|
|
|
|
// Remove this driver
|
|
|
|
|
it = drivers.erase(it);
|
|
|
|
|
// Add missing items element-wise
|
|
|
|
|
for (uint32_t i = aEnd; i < bEnd; ++i) {
|
2025-08-08 23:53:12 +02:00
|
|
|
DfgArraySel* const aselp = new DfgArraySel{m_dfg, flp, elemDtypep};
|
2025-07-21 18:33:12 +02:00
|
|
|
aselp->fromp(bVtxp);
|
2025-08-08 23:53:12 +02:00
|
|
|
aselp->bitp(new DfgConst{m_dfg, flp, 32, i});
|
2025-07-21 18:33:12 +02:00
|
|
|
drivers.emplace_back(flp, i, aselp);
|
|
|
|
|
}
|
|
|
|
|
it = drivers.begin();
|
|
|
|
|
std::advance(it, distance);
|
|
|
|
|
std::stable_sort(it, drivers.end(), cmp);
|
|
|
|
|
}
|
2022-09-30 12:35:03 +02:00
|
|
|
}
|
|
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
|
|
|
|
|
// Reinsert drivers in order
|
2025-08-08 23:53:12 +02:00
|
|
|
for (const Driver& d : drivers) splicep->addDriver(d.m_flp, d.m_low, d.m_vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2025-07-01 23:55:08 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// CONSTRUCTOR
|
|
|
|
|
AstToDfgNormalizeDrivers(DfgGraph& dfg, DfgVertexVar& var)
|
|
|
|
|
: m_dfg{dfg}
|
|
|
|
|
, m_var{var} {
|
|
|
|
|
// Nothing to do for un-driven (input) variables
|
|
|
|
|
if (!var.srcp()) return;
|
|
|
|
|
|
|
|
|
|
// The driver of a variable must always be a splice vertex, normalize it
|
|
|
|
|
if (DfgSpliceArray* const sArrayp = var.srcp()->cast<DfgSpliceArray>()) {
|
|
|
|
|
normalizeArray("", sArrayp);
|
|
|
|
|
} else if (DfgSplicePacked* const sPackedp = var.srcp()->cast<DfgSplicePacked>()) {
|
|
|
|
|
normalizePacked("", sPackedp);
|
2025-07-01 23:55:08 +02:00
|
|
|
} else {
|
2025-08-08 23:53:12 +02:00
|
|
|
var.v3fatalSrc("Unhandled DfgVertexSplice sub-type"); // LCOV_EXCL_LINE
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
public:
|
|
|
|
|
// Normalize drivers of given variable
|
|
|
|
|
static void apply(DfgGraph& dfg, DfgVertexVar& var) { AstToDfgNormalizeDrivers{dfg, var}; }
|
|
|
|
|
};
|
2025-07-01 23:55:08 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Coalesce contiguous driver ranges,
|
|
|
|
|
// and remove redundant splice vertices (when the variable is driven whole)
|
|
|
|
|
class AstToDfgCoalesceDrivers final {
|
|
|
|
|
// TYPES
|
|
|
|
|
struct Driver final {
|
|
|
|
|
FileLine* m_flp; // Location of driver in source
|
|
|
|
|
uint32_t m_low; // Low index of driven range
|
|
|
|
|
DfgVertex* m_vtxp; // Driving vertex
|
|
|
|
|
Driver() = delete;
|
|
|
|
|
Driver(FileLine* flp, uint32_t low, DfgVertex* vtxp)
|
|
|
|
|
: m_flp{flp}
|
|
|
|
|
, m_low{low}
|
|
|
|
|
, m_vtxp{vtxp} {}
|
|
|
|
|
};
|
2022-09-26 15:21:05 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// STATE
|
|
|
|
|
DfgGraph& m_dfg; // The graph being processed
|
|
|
|
|
V3DfgAstToDfgContext& m_ctx; // The context for stats
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// METHODS
|
2022-10-01 13:28:16 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Coalesce packed driver - return the coalesced vertex and location for 'splicep'
|
|
|
|
|
std::pair<DfgVertex*, FileLine*> coalescePacked(DfgSplicePacked* const splicep) {
|
|
|
|
|
UASSERT_OBJ(splicep->arity() >= 1, splicep, "Undriven DfgSplicePacked");
|
2022-10-01 13:28:16 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// The drivers of 'splicep'
|
|
|
|
|
std::vector<Driver> drivers;
|
|
|
|
|
drivers.reserve(splicep->arity());
|
2022-10-01 13:28:16 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Gather and unlink all drivers
|
|
|
|
|
int64_t prevHigh = -1; // High index of previous driven range
|
|
|
|
|
splicep->forEachSourceEdge([&](DfgEdge& edge, size_t i) {
|
|
|
|
|
DfgVertex* const driverp = edge.sourcep();
|
|
|
|
|
UASSERT_OBJ(driverp, splicep, "Should not have created undriven sources");
|
|
|
|
|
UASSERT_OBJ(!driverp->is<DfgVertexSplice>(), splicep, "Should not be DfgVertexSplice");
|
|
|
|
|
const uint32_t low = splicep->driverLsb(i);
|
|
|
|
|
UASSERT_OBJ(static_cast<int64_t>(low) > prevHigh, splicep,
|
|
|
|
|
"Drivers should have been normalized");
|
|
|
|
|
prevHigh = low + driverp->width() - 1;
|
|
|
|
|
// Gather
|
|
|
|
|
drivers.emplace_back(splicep->driverFileLine(i), low, driverp);
|
|
|
|
|
// Unlink
|
|
|
|
|
edge.unlinkSource();
|
|
|
|
|
});
|
|
|
|
|
splicep->resetSources();
|
2022-10-01 13:28:16 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Coalesce adjacent ranges
|
|
|
|
|
if (drivers.size() > 1) {
|
|
|
|
|
size_t mergeInto = 0;
|
|
|
|
|
size_t mergeFrom = 1;
|
|
|
|
|
do {
|
|
|
|
|
Driver& into = drivers[mergeInto];
|
|
|
|
|
Driver& from = drivers[mergeFrom];
|
|
|
|
|
const uint32_t intoWidth = into.m_vtxp->width();
|
|
|
|
|
const uint32_t fromWidth = from.m_vtxp->width();
|
|
|
|
|
|
|
|
|
|
if (into.m_low + intoWidth == from.m_low) {
|
|
|
|
|
// Adjacent ranges, coalesce
|
|
|
|
|
const auto dtypep = DfgVertex::dtypeForWidth(intoWidth + fromWidth);
|
|
|
|
|
DfgConcat* const concatp = new DfgConcat{m_dfg, into.m_flp, dtypep};
|
|
|
|
|
concatp->rhsp(into.m_vtxp);
|
|
|
|
|
concatp->lhsp(from.m_vtxp);
|
|
|
|
|
into.m_vtxp = concatp;
|
|
|
|
|
from.m_vtxp = nullptr; // Mark as moved
|
|
|
|
|
++m_ctx.m_coalescedAssignments;
|
|
|
|
|
} else {
|
|
|
|
|
// There is a gap - future merges go into the next position
|
|
|
|
|
++mergeInto;
|
|
|
|
|
// Move 'from' into the next position, unless it's already there
|
|
|
|
|
if (mergeFrom != mergeInto) {
|
|
|
|
|
Driver& next = drivers[mergeInto];
|
|
|
|
|
UASSERT_OBJ(!next.m_vtxp, next.m_flp, "Should have been marked moved");
|
|
|
|
|
next = from;
|
|
|
|
|
from.m_vtxp = nullptr; // Mark as moved
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-01 13:28:16 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Consider next driver
|
|
|
|
|
++mergeFrom;
|
|
|
|
|
} while (mergeFrom < drivers.size());
|
|
|
|
|
// Rightsize vector
|
|
|
|
|
drivers.erase(drivers.begin() + (mergeInto + 1), drivers.end());
|
|
|
|
|
}
|
2022-10-01 13:28:16 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// If the variable is driven whole, we can just use that driver
|
|
|
|
|
if (drivers.size() == 1 //
|
|
|
|
|
&& drivers[0].m_low == 0 //
|
|
|
|
|
&& drivers[0].m_vtxp->width() == splicep->width()) {
|
|
|
|
|
VL_DO_DANGLING(splicep->unlinkDelete(m_dfg), splicep);
|
|
|
|
|
// Use the driver directly
|
|
|
|
|
return {drivers[0].m_vtxp, drivers[0].m_flp};
|
2022-10-01 13:28:16 +02:00
|
|
|
}
|
2025-08-08 23:53:12 +02:00
|
|
|
|
|
|
|
|
// Reinsert drivers in order
|
|
|
|
|
for (const Driver& d : drivers) splicep->addDriver(d.m_flp, d.m_low, d.m_vtxp);
|
|
|
|
|
// Use the original splice
|
|
|
|
|
return {splicep, splicep->fileline()};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Coalesce array driver - return the coalesced vertex and location for 'splicep'
|
|
|
|
|
std::pair<DfgVertex*, FileLine*> coalesceArray(DfgSpliceArray* const splicep) {
|
|
|
|
|
UASSERT_OBJ(splicep->arity() >= 1, splicep, "Undriven DfgSpliceArray");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// The drivers of 'splicep'
|
|
|
|
|
std::vector<Driver> drivers;
|
|
|
|
|
drivers.reserve(splicep->arity());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Coalesce, gather and unlink all drivers
|
|
|
|
|
int64_t prevHigh = -1; // High index of previous driven range
|
|
|
|
|
splicep->forEachSourceEdge([&](DfgEdge& edge, size_t i) {
|
|
|
|
|
DfgVertex* driverp = edge.sourcep();
|
|
|
|
|
UASSERT_OBJ(driverp, splicep, "Should not have created undriven sources");
|
|
|
|
|
const uint32_t low = splicep->driverIndex(i);
|
|
|
|
|
UASSERT_OBJ(static_cast<int64_t>(low) > prevHigh, splicep,
|
|
|
|
|
"Drivers should have been normalized");
|
|
|
|
|
prevHigh = low;
|
|
|
|
|
FileLine* flp = splicep->driverFileLine(i);
|
|
|
|
|
// Coalesce
|
|
|
|
|
if (DfgSplicePacked* const spp = driverp->cast<DfgSplicePacked>()) {
|
|
|
|
|
std::tie(driverp, flp) = coalescePacked(spp);
|
|
|
|
|
} else if (DfgSpliceArray* const sap = driverp->cast<DfgSpliceArray>()) {
|
|
|
|
|
std::tie(driverp, flp) = coalesceArray(sap);
|
|
|
|
|
} else if (driverp->is<DfgVertexSplice>()) {
|
|
|
|
|
driverp->v3fatalSrc("Unhandled DfgVertexSplice sub-type"); // LCOV_EXCL_LINE
|
|
|
|
|
}
|
|
|
|
|
// Gather
|
|
|
|
|
drivers.emplace_back(flp, low, driverp);
|
|
|
|
|
// Unlink
|
|
|
|
|
edge.unlinkSource();
|
|
|
|
|
});
|
|
|
|
|
splicep->resetSources();
|
2025-07-01 23:55:08 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// If the variable is driven whole, we can just use that driver
|
|
|
|
|
if (drivers.size() == 1 //
|
|
|
|
|
&& drivers[0].m_low == 0 //
|
|
|
|
|
&& drivers[0].m_vtxp->dtypep()->isSame(splicep->dtypep())) {
|
|
|
|
|
VL_DO_DANGLING(splicep->unlinkDelete(m_dfg), splicep);
|
|
|
|
|
// Use the driver directly
|
|
|
|
|
return {drivers[0].m_vtxp, drivers[0].m_flp};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Reinsert drivers in order
|
|
|
|
|
for (const Driver& d : drivers) splicep->addDriver(d.m_flp, d.m_low, d.m_vtxp);
|
|
|
|
|
// Use the original splice
|
|
|
|
|
return {splicep, splicep->fileline()};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// CONSTRUCTOR
|
|
|
|
|
AstToDfgCoalesceDrivers(DfgGraph& dfg, DfgVertexVar& var, V3DfgAstToDfgContext& ctx)
|
|
|
|
|
: m_dfg{dfg}
|
|
|
|
|
, m_ctx{ctx} {
|
|
|
|
|
// Nothing to do for un-driven (input) variables
|
|
|
|
|
if (!var.srcp()) return;
|
|
|
|
|
|
|
|
|
|
// The driver of a variable must always be a splice vertex, coalesce it
|
|
|
|
|
std::pair<DfgVertex*, FileLine*> normalizedDriver;
|
|
|
|
|
if (DfgSpliceArray* const sArrayp = var.srcp()->cast<DfgSpliceArray>()) {
|
|
|
|
|
normalizedDriver = coalesceArray(sArrayp);
|
|
|
|
|
} else if (DfgSplicePacked* const sPackedp = var.srcp()->cast<DfgSplicePacked>()) {
|
|
|
|
|
normalizedDriver = coalescePacked(sPackedp);
|
|
|
|
|
} else {
|
|
|
|
|
var.v3fatalSrc("Unhandled DfgVertexSplice sub-type"); // LCOV_EXCL_LINE
|
|
|
|
|
}
|
|
|
|
|
var.srcp(normalizedDriver.first);
|
|
|
|
|
var.driverFileLine(normalizedDriver.second);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
public:
|
|
|
|
|
// Coalesce drivers of given variable
|
|
|
|
|
static void apply(DfgGraph& dfg, DfgVertexVar& var, V3DfgAstToDfgContext& ctx) {
|
|
|
|
|
AstToDfgCoalesceDrivers{dfg, var, ctx};
|
|
|
|
|
}
|
|
|
|
|
};
|
2025-06-24 17:59:09 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Visitor that converts a whole module (when T_Scoped is false),
|
|
|
|
|
// or the whole netlist (when T_Scoped is true).
|
|
|
|
|
template <bool T_Scoped>
|
|
|
|
|
class AstToDfgVisitor final : public VNVisitor {
|
|
|
|
|
// NODE STATE
|
|
|
|
|
const VNUser2InUse m_user2InUse; // Used by AstToDfgConverter
|
2022-10-06 19:34:18 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// TYPES
|
|
|
|
|
using RootType = std::conditional_t<T_Scoped, AstNetlist, AstModule>;
|
|
|
|
|
using Variable = std::conditional_t<T_Scoped, AstVarScope, AstVar>;
|
2022-10-06 19:34:18 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// STATE
|
|
|
|
|
AstToDfgConverter<T_Scoped> m_converter; // The convert instance to use for each construct
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// METHODS
|
|
|
|
|
static Variable* getTarget(const AstVarRef* refp) {
|
|
|
|
|
// TODO: remove the useless reinterpret_casts when C++17 'if constexpr' actually works
|
2025-07-01 23:55:08 +02:00
|
|
|
if VL_CONSTEXPR_CXX17 (T_Scoped) {
|
2025-08-08 23:53:12 +02:00
|
|
|
return reinterpret_cast<Variable*>(refp->varScopep());
|
2025-07-01 23:55:08 +02:00
|
|
|
} else {
|
2025-08-08 23:53:12 +02:00
|
|
|
return reinterpret_cast<Variable*>(refp->varp());
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Mark variables referenced under node
|
|
|
|
|
static void markReferenced(AstNode* nodep) {
|
|
|
|
|
nodep->foreach([](const AstVarRef* refp) {
|
|
|
|
|
Variable* const tgtp = getTarget(refp);
|
|
|
|
|
// Mark as read from non-DFG logic
|
|
|
|
|
if (refp->access().isReadOrRW()) DfgVertexVar::setHasModRdRefs(tgtp);
|
|
|
|
|
// Mark as written from non-DFG logic
|
|
|
|
|
if (refp->access().isWriteOrRW()) DfgVertexVar::setHasModWrRefs(tgtp);
|
|
|
|
|
});
|
|
|
|
|
}
|
2022-09-25 17:03:15 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// VISITORS
|
|
|
|
|
// Unhandled node
|
|
|
|
|
void visit(AstNode* nodep) override { markReferenced(nodep); }
|
2025-07-21 18:33:12 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Containers to descend through to find logic constructs
|
|
|
|
|
void visit(AstNetlist* nodep) override { iterateAndNextNull(nodep->modulesp()); }
|
|
|
|
|
void visit(AstModule* nodep) override { iterateAndNextNull(nodep->stmtsp()); }
|
|
|
|
|
void visit(AstTopScope* nodep) override { iterate(nodep->scopep()); }
|
|
|
|
|
void visit(AstScope* nodep) override { iterateChildren(nodep); }
|
|
|
|
|
void visit(AstActive* nodep) override {
|
|
|
|
|
if (nodep->hasCombo()) {
|
|
|
|
|
iterateChildren(nodep);
|
|
|
|
|
} else {
|
|
|
|
|
markReferenced(nodep);
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-07-21 18:33:12 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Non-representable constructs
|
|
|
|
|
void visit(AstCell* nodep) override { markReferenced(nodep); }
|
|
|
|
|
void visit(AstNodeProcedure* nodep) override { markReferenced(nodep); }
|
2025-07-21 18:33:12 +02:00
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
// Potentially representable constructs
|
|
|
|
|
void visit(AstAssignW* nodep) override {
|
|
|
|
|
if (!m_converter.convert(nodep)) markReferenced(nodep);
|
|
|
|
|
}
|
|
|
|
|
void visit(AstAlways* nodep) override {
|
|
|
|
|
if (!m_converter.convert(nodep)) markReferenced(nodep);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
AstToDfgVisitor(DfgGraph& dfg, RootType& root, V3DfgAstToDfgContext& ctx)
|
|
|
|
|
: m_converter{dfg, ctx} {
|
|
|
|
|
iterate(&root);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public:
|
2025-08-08 23:53:12 +02:00
|
|
|
static void apply(DfgGraph& dfg, RootType& root, V3DfgAstToDfgContext& ctx) {
|
|
|
|
|
// Convert all logic under 'root'
|
|
|
|
|
AstToDfgVisitor{dfg, root, ctx};
|
|
|
|
|
if (dumpDfgLevel() >= 9) dfg.dumpDotFilePrefixed(ctx.prefix() + "ast2dfg-conv");
|
|
|
|
|
// Normalize and coalesce all variable drivers
|
|
|
|
|
for (DfgVertexVar& var : dfg.varVertices()) {
|
|
|
|
|
AstToDfgNormalizeDrivers::apply(dfg, var);
|
|
|
|
|
AstToDfgCoalesceDrivers::apply(dfg, var, ctx);
|
|
|
|
|
}
|
|
|
|
|
if (dumpDfgLevel() >= 9) dfg.dumpDotFilePrefixed(ctx.prefix() + "ast2dfg-norm");
|
|
|
|
|
// Remove all unused vertices
|
|
|
|
|
V3DfgPasses::removeUnused(dfg);
|
|
|
|
|
if (dumpDfgLevel() >= 9) dfg.dumpDotFilePrefixed(ctx.prefix() + "ast2dfg-prun");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
std::unique_ptr<DfgGraph> V3DfgPasses::astToDfg(AstModule& module, V3DfgContext& ctx) {
|
|
|
|
|
DfgGraph* const dfgp = new DfgGraph{&module, module.name()};
|
|
|
|
|
AstToDfgVisitor</* T_Scoped: */ false>::apply(*dfgp, module, ctx.m_ast2DfgContext);
|
|
|
|
|
return std::unique_ptr<DfgGraph>{dfgp};
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
2025-08-08 23:53:12 +02:00
|
|
|
std::unique_ptr<DfgGraph> V3DfgPasses::astToDfg(AstNetlist& netlist, V3DfgContext& ctx) {
|
|
|
|
|
DfgGraph* const dfgp = new DfgGraph{nullptr, "netlist"};
|
|
|
|
|
AstToDfgVisitor</* T_Scoped: */ true>::apply(*dfgp, netlist, ctx.m_ast2DfgContext);
|
|
|
|
|
return std::unique_ptr<DfgGraph>{dfgp};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|