Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Convert DfgGraph to AstModule
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2025-01-01 14:30:25 +01:00
|
|
|
// Copyright 2003-2025 by Wilson Snyder. This program is free software; you
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2022-10-12 11:19:21 +02:00
|
|
|
// Convert DfgGraph back to AstModule. We recursively construct AstNodeExpr expressions for each
|
2022-09-27 01:06:50 +02:00
|
|
|
// DfgVertex which represents a storage location (e.g.: DfgVarPacked), or has multiple sinks
|
|
|
|
|
// without driving a storage location (and hence needs a temporary variable to duplication). The
|
|
|
|
|
// recursion stops when we reach a DfgVertex representing a storage location (e.g.: DfgVarPacked),
|
|
|
|
|
// or a vertex that that has multiple sinks (as these nodes will have a [potentially new temporary]
|
|
|
|
|
// corresponding// storage location). Redundant variables (those whose source vertex drives
|
|
|
|
|
// multiple variables) are eliminated when possible. Vertices driving multiple variables are
|
|
|
|
|
// rendered once, driving an arbitrarily (but deterministically) chosen canonical variable, and the
|
|
|
|
|
// corresponding redundant variables are assigned from the canonical variable.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
2023-10-18 12:37:46 +02:00
|
|
|
#include "V3PchAstNoMT.h" // VL_MT_DISABLED_CODE_UNIT
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Dfg.h"
|
|
|
|
|
#include "V3DfgPasses.h"
|
|
|
|
|
#include "V3UniqueNames.h"
|
|
|
|
|
|
|
|
|
|
#include <unordered_map>
|
|
|
|
|
|
|
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
|
|
2022-10-12 11:19:21 +02:00
|
|
|
// Create an AstNodeExpr out of a DfgVertex. For most AstNodeExpr subtypes, this can be done
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// automatically. For the few special cases, we provide specializations below
|
2024-11-30 02:20:38 +01:00
|
|
|
template <typename T_Node, typename T_Vertex, typename... Ops>
|
|
|
|
|
T_Node* makeNode(const T_Vertex* vtxp, Ops... ops) {
|
|
|
|
|
T_Node* const nodep = new T_Node{vtxp->fileline(), ops...};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
UASSERT_OBJ(nodep->width() == static_cast<int>(vtxp->width()), vtxp,
|
|
|
|
|
"Incorrect width in AstNode created from DfgVertex "
|
|
|
|
|
<< vtxp->typeName() << ": " << nodep->width() << " vs " << vtxp->width());
|
|
|
|
|
return nodep;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//======================================================================
|
|
|
|
|
// Vertices needing special conversion
|
|
|
|
|
|
|
|
|
|
template <>
|
2022-10-12 11:19:21 +02:00
|
|
|
AstExtend* makeNode<AstExtend, DfgExtend, AstNodeExpr*>( //
|
|
|
|
|
const DfgExtend* vtxp, AstNodeExpr* op1) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return new AstExtend{vtxp->fileline(), op1, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
2022-10-12 11:19:21 +02:00
|
|
|
AstExtendS* makeNode<AstExtendS, DfgExtendS, AstNodeExpr*>( //
|
|
|
|
|
const DfgExtendS* vtxp, AstNodeExpr* op1) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return new AstExtendS{vtxp->fileline(), op1, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
2022-10-12 11:19:21 +02:00
|
|
|
AstShiftL* makeNode<AstShiftL, DfgShiftL, AstNodeExpr*, AstNodeExpr*>( //
|
|
|
|
|
const DfgShiftL* vtxp, AstNodeExpr* op1, AstNodeExpr* op2) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return new AstShiftL{vtxp->fileline(), op1, op2, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
2022-10-12 11:19:21 +02:00
|
|
|
AstShiftR* makeNode<AstShiftR, DfgShiftR, AstNodeExpr*, AstNodeExpr*>( //
|
|
|
|
|
const DfgShiftR* vtxp, AstNodeExpr* op1, AstNodeExpr* op2) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return new AstShiftR{vtxp->fileline(), op1, op2, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <>
|
2022-10-12 11:19:21 +02:00
|
|
|
AstShiftRS* makeNode<AstShiftRS, DfgShiftRS, AstNodeExpr*, AstNodeExpr*>( //
|
|
|
|
|
const DfgShiftRS* vtxp, AstNodeExpr* op1, AstNodeExpr* op2) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return new AstShiftRS{vtxp->fileline(), op1, op2, static_cast<int>(vtxp->width())};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
} // namespace
|
|
|
|
|
|
2025-07-01 23:55:08 +02:00
|
|
|
template <bool T_Scoped>
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class DfgToAstVisitor final : DfgVisitor {
|
2025-07-01 23:55:08 +02:00
|
|
|
// NODE STATE
|
|
|
|
|
|
2025-08-05 11:24:54 +02:00
|
|
|
// AstScope::user2p // The combinational AstActive under this scope
|
|
|
|
|
const VNUser2InUse m_user2InUse;
|
2025-07-01 23:55:08 +02:00
|
|
|
|
|
|
|
|
// TYPES
|
2025-09-02 17:50:40 +02:00
|
|
|
using Variable = std::conditional_t<T_Scoped, AstVarScope, AstVar>;
|
|
|
|
|
using Container = std::conditional_t<T_Scoped, AstActive, AstNodeModule>;
|
2025-07-01 23:55:08 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// STATE
|
2025-07-01 23:55:08 +02:00
|
|
|
AstModule* const m_modp; // The parent/result module - This is nullptr when T_Scoped
|
2025-07-26 21:37:01 +02:00
|
|
|
V3DfgDfgToAstContext& m_ctx; // The context for stats
|
2022-10-12 11:19:21 +02:00
|
|
|
AstNodeExpr* m_resultp = nullptr; // The result node of the current traversal
|
2025-09-02 17:50:40 +02:00
|
|
|
AstAlways* m_alwaysp = nullptr; // Process to add assignments to, if have a default driver
|
|
|
|
|
Container* m_containerp = nullptr; // The AstNodeModule or AstActive to insert assigns into
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
static Variable* getNode(const DfgVertexVar* vtxp) {
|
2025-07-01 23:55:08 +02:00
|
|
|
if VL_CONSTEXPR_CXX17 (T_Scoped) {
|
2025-09-02 17:50:40 +02:00
|
|
|
return reinterpret_cast<Variable*>(vtxp->varScopep());
|
2025-07-01 23:55:08 +02:00
|
|
|
} else {
|
2025-09-02 17:50:40 +02:00
|
|
|
return reinterpret_cast<Variable*>(vtxp->varp());
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static AstActive* getCombActive(AstScope* scopep) {
|
2025-08-05 11:24:54 +02:00
|
|
|
if (!scopep->user2p()) {
|
2025-07-01 23:55:08 +02:00
|
|
|
// Try to find the existing combinational AstActive
|
|
|
|
|
for (AstNode* nodep = scopep->blocksp(); nodep; nodep = nodep->nextp()) {
|
|
|
|
|
AstActive* const activep = VN_CAST(nodep, Active);
|
|
|
|
|
if (!activep) continue;
|
|
|
|
|
if (activep->hasCombo()) {
|
2025-08-05 11:24:54 +02:00
|
|
|
scopep->user2p(activep);
|
2025-07-01 23:55:08 +02:00
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// If there isn't one, create a new one
|
2025-08-05 11:24:54 +02:00
|
|
|
if (!scopep->user2p()) {
|
2025-07-01 23:55:08 +02:00
|
|
|
FileLine* const flp = scopep->fileline();
|
|
|
|
|
AstSenTree* const senTreep
|
|
|
|
|
= new AstSenTree{flp, new AstSenItem{flp, AstSenItem::Combo{}}};
|
|
|
|
|
AstActive* const activep = new AstActive{flp, "", senTreep};
|
2025-08-18 01:14:34 +02:00
|
|
|
activep->senTreeStorep(senTreep);
|
2025-07-01 23:55:08 +02:00
|
|
|
scopep->addBlocksp(activep);
|
2025-08-05 11:24:54 +02:00
|
|
|
scopep->user2p(activep);
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
}
|
2025-08-05 11:24:54 +02:00
|
|
|
return VN_AS(scopep->user2p(), Active);
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-12 11:19:21 +02:00
|
|
|
AstNodeExpr* convertDfgVertexToAstNodeExpr(DfgVertex* vtxp) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
UASSERT_OBJ(!m_resultp, vtxp, "Result already computed");
|
2025-09-10 13:38:49 +02:00
|
|
|
UASSERT_OBJ(vtxp->is<DfgVertexVar>() || vtxp->is<DfgConst>() //
|
|
|
|
|
|| !vtxp->hasMultipleSinks() || vtxp->isCheaperThanLoad(), //
|
2024-03-02 20:49:29 +01:00
|
|
|
vtxp, "Intermediate DFG value with multiple uses");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
iterate(vtxp);
|
|
|
|
|
UASSERT_OBJ(m_resultp, vtxp, "Missing result");
|
2022-10-12 11:19:21 +02:00
|
|
|
AstNodeExpr* const resultp = m_resultp;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
m_resultp = nullptr;
|
|
|
|
|
return resultp;
|
|
|
|
|
}
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
void createAssignment(FileLine* flp, AstNodeExpr* lhsp, DfgVertex* driverp) {
|
|
|
|
|
// Keep track of statisticss
|
|
|
|
|
++m_ctx.m_resultEquations;
|
|
|
|
|
// Render the driver
|
|
|
|
|
AstNodeExpr* const rhsp = convertDfgVertexToAstNodeExpr(driverp);
|
|
|
|
|
// Update LHS locations to reflect the location of the original driver
|
|
|
|
|
lhsp->foreach([&](AstNode* nodep) { nodep->fileline(flp); });
|
|
|
|
|
|
|
|
|
|
// If using a process, add Assign there
|
|
|
|
|
if (m_alwaysp) {
|
|
|
|
|
m_alwaysp->addStmtsp(new AstAssign{flp, lhsp, rhsp});
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Otherwise create an AssignW
|
Internals: Make AstAssignW a procedural statement (#6280) (#6556)
Initial idea was to remodel AssignW as Assign under Alway. Trying that
uncovered some issues, the most difficult of them was that a delay
attached to a continuous assignment behaves differently from a delay
attached to a blocking assignment statement, so we need to keep the
knowledge of which flavour an assignment was until V3Timing.
So instead of removing AstAssignW, we always wrap it in an AstAlways,
with a special `keyword()` type. This makes it into a proper procedural
statement, which is almost equivalent to AstAssign, except for the case
when they contain a delay. We still gain the benefits of #6280 and can
simplify some code. Every AstNodeStmt should now be under an
AstNodeProcedure - which we should rename to AstProcess, or an
AstNodeFTask). As a result, V3Table can now handle AssignW for free.
Also uncovered and fixed a bug in handling intra-assignment delays if
a function is present on the RHS of an AssignW.
There is more work to be done towards #6280, and potentially simplifying
AssignW handing, but this is the minimal change required to tick it off
the TODO list for #6280.
2025-10-14 10:05:19 +02:00
|
|
|
AstAssignW* const ap = new AstAssignW{flp, lhsp, rhsp};
|
|
|
|
|
m_containerp->addStmtsp(new AstAlways{ap});
|
2025-09-02 17:50:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void convertDriver(FileLine* flp, AstNodeExpr* lhsp, DfgVertex* driverp) {
|
2025-07-21 18:33:12 +02:00
|
|
|
if (DfgSplicePacked* const sPackedp = driverp->cast<DfgSplicePacked>()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
// Partial assignment of packed value
|
|
|
|
|
sPackedp->foreachDriver([&](DfgVertex& src, uint32_t lo, FileLine* dflp) {
|
2025-07-21 18:33:12 +02:00
|
|
|
// Create Sel
|
2025-09-02 17:50:40 +02:00
|
|
|
AstConst* const lsbp = new AstConst{dflp, lo};
|
|
|
|
|
const int width = static_cast<int>(src.width());
|
2025-07-21 18:33:12 +02:00
|
|
|
AstSel* const nLhsp = new AstSel{dflp, lhsp->cloneTreePure(false), lsbp, width};
|
|
|
|
|
// Convert source
|
2025-09-02 17:50:40 +02:00
|
|
|
convertDriver(dflp, nLhsp, &src);
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Delete Sel - was cloned
|
|
|
|
|
VL_DO_DANGLING(nLhsp->deleteTree(), nLhsp);
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2022-09-25 17:03:15 +02:00
|
|
|
});
|
2025-07-14 23:09:34 +02:00
|
|
|
return;
|
2022-09-25 17:03:15 +02:00
|
|
|
}
|
2025-07-14 23:09:34 +02:00
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
if (DfgSpliceArray* const sArrayp = driverp->cast<DfgSpliceArray>()) {
|
|
|
|
|
// Partial assignment of array variable
|
2025-09-02 17:50:40 +02:00
|
|
|
sArrayp->foreachDriver([&](DfgVertex& src, uint32_t lo, FileLine* dflp) {
|
|
|
|
|
UASSERT_OBJ(src.size() == 1, &src, "We only handle single elements");
|
2025-07-21 18:33:12 +02:00
|
|
|
// Create ArraySel
|
2025-09-02 17:50:40 +02:00
|
|
|
AstConst* const idxp = new AstConst{dflp, lo};
|
2025-07-21 18:33:12 +02:00
|
|
|
AstArraySel* const nLhsp = new AstArraySel{dflp, lhsp->cloneTreePure(false), idxp};
|
|
|
|
|
// Convert source
|
2025-09-02 17:50:40 +02:00
|
|
|
if (const DfgUnitArray* const uap = src.cast<DfgUnitArray>()) {
|
|
|
|
|
convertDriver(dflp, nLhsp, uap->srcp());
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
} else {
|
2025-09-02 17:50:40 +02:00
|
|
|
convertDriver(dflp, nLhsp, &src);
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
|
|
|
|
// Delete ArraySel - was cloned
|
|
|
|
|
VL_DO_DANGLING(nLhsp->deleteTree(), nLhsp);
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
2025-07-14 23:09:34 +02:00
|
|
|
});
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2025-08-20 19:21:24 +02:00
|
|
|
if (const DfgUnitArray* const uap = driverp->cast<DfgUnitArray>()) {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Single element array being assigned a unit array. Needs an ArraySel.
|
|
|
|
|
AstConst* const idxp = new AstConst{flp, 0};
|
|
|
|
|
AstArraySel* const nLhsp = new AstArraySel{flp, lhsp->cloneTreePure(false), idxp};
|
|
|
|
|
// Convert source
|
2025-09-02 17:50:40 +02:00
|
|
|
convertDriver(flp, nLhsp, uap->srcp());
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Delete ArraySel - was cloned
|
|
|
|
|
VL_DO_DANGLING(nLhsp->deleteTree(), nLhsp);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Base case: assign vertex to current lhs
|
2025-09-02 17:50:40 +02:00
|
|
|
createAssignment(flp, lhsp->cloneTreePure(false), driverp);
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// VISITORS
|
|
|
|
|
void visit(DfgVertex* vtxp) override { // LCOV_EXCL_START
|
2025-03-24 00:51:54 +01:00
|
|
|
vtxp->v3fatalSrc("Unhandled DfgVertex: " << vtxp->typeName());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
} // LCOV_EXCL_STOP
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void visit(DfgVarPacked* vtxp) override {
|
2025-07-01 23:55:08 +02:00
|
|
|
m_resultp = new AstVarRef{vtxp->fileline(), getNode(vtxp), VAccess::READ};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
void visit(DfgVarArray* vtxp) override {
|
2025-07-01 23:55:08 +02:00
|
|
|
m_resultp = new AstVarRef{vtxp->fileline(), getNode(vtxp), VAccess::READ};
|
2022-09-27 01:06:50 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void visit(DfgConst* vtxp) override { //
|
2022-10-07 16:44:14 +02:00
|
|
|
m_resultp = new AstConst{vtxp->fileline(), vtxp->num()};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-06 19:34:18 +02:00
|
|
|
void visit(DfgSel* vtxp) override {
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
2024-03-02 20:49:29 +01:00
|
|
|
AstNodeExpr* const fromp = convertDfgVertexToAstNodeExpr(vtxp->fromp());
|
2022-10-06 19:34:18 +02:00
|
|
|
AstConst* const lsbp = new AstConst{flp, vtxp->lsb()};
|
2025-06-24 17:59:09 +02:00
|
|
|
m_resultp = new AstSel{flp, fromp, lsbp, static_cast<int>(vtxp->width())};
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void visit(DfgMux* vtxp) override {
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
2024-03-02 20:49:29 +01:00
|
|
|
AstNodeExpr* const fromp = convertDfgVertexToAstNodeExpr(vtxp->fromp());
|
|
|
|
|
AstNodeExpr* const lsbp = convertDfgVertexToAstNodeExpr(vtxp->lsbp());
|
2025-06-24 17:59:09 +02:00
|
|
|
m_resultp = new AstSel{flp, fromp, lsbp, static_cast<int>(vtxp->width())};
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// The rest of the 'visit' methods are generated by 'astgen'
|
|
|
|
|
#include "V3Dfg__gen_dfg_to_ast.h"
|
|
|
|
|
|
|
|
|
|
// Constructor
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgToAstVisitor(DfgGraph& dfg, V3DfgDfgToAstContext& ctx)
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
: m_modp{dfg.modulep()}
|
|
|
|
|
, m_ctx{ctx} {
|
2025-09-07 21:38:50 +02:00
|
|
|
if (v3Global.opt.debugCheck()) V3DfgPasses::typeCheck(dfg);
|
2022-10-06 12:26:11 +02:00
|
|
|
|
2025-09-07 21:38:50 +02:00
|
|
|
// Convert the graph back to combinational assignments
|
2024-03-02 20:49:29 +01:00
|
|
|
// The graph must have been regularized, so we only need to render assignments
|
2024-03-26 00:06:25 +01:00
|
|
|
for (DfgVertexVar& vtx : dfg.varVertices()) {
|
2022-10-07 17:13:01 +02:00
|
|
|
// If there is no driver (this vertex is an input to the graph), then nothing to do.
|
2025-09-02 17:50:40 +02:00
|
|
|
if (!vtx.srcp()) {
|
|
|
|
|
UASSERT_OBJ(!vtx.defaultp(), &vtx, "Only default driver on variable");
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2022-10-07 17:13:01 +02:00
|
|
|
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
++m_ctx.m_outputVariables;
|
|
|
|
|
|
2025-07-21 18:33:12 +02:00
|
|
|
// Render variable assignments
|
|
|
|
|
FileLine* const flp = vtx.driverFileLine() ? vtx.driverFileLine() : vtx.fileline();
|
|
|
|
|
AstVarRef* const lhsp = new AstVarRef{flp, getNode(&vtx), VAccess::WRITE};
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
VL_RESTORER(m_containerp);
|
|
|
|
|
if VL_CONSTEXPR_CXX17 (T_Scoped) {
|
|
|
|
|
// Add it to the scope holding the target variable
|
|
|
|
|
AstActive* const activep = getCombActive(vtx.varScopep()->scopep());
|
|
|
|
|
m_containerp = reinterpret_cast<Container*>(activep);
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
} else {
|
2025-09-02 17:50:40 +02:00
|
|
|
// Add it to the parent module of the DfgGraph
|
|
|
|
|
m_containerp = reinterpret_cast<Container*>(m_modp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// If there is a default value, render all drivers under an AstAlways
|
|
|
|
|
VL_RESTORER(m_alwaysp);
|
|
|
|
|
if (DfgVertex* const defaultp = vtx.defaultp()) {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
++m_ctx.m_outputVariablesWithDefault;
|
2025-09-02 17:50:40 +02:00
|
|
|
m_alwaysp = new AstAlways{vtx.fileline(), VAlwaysKwd::ALWAYS_COMB, nullptr};
|
|
|
|
|
m_containerp->addStmtsp(m_alwaysp);
|
|
|
|
|
// The default assignment needs to go first
|
|
|
|
|
createAssignment(vtx.fileline(), lhsp->cloneTreePure(false), defaultp);
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
|
|
|
|
|
// Render the drivers
|
|
|
|
|
convertDriver(flp, lhsp, vtx.srcp());
|
|
|
|
|
|
|
|
|
|
// convetDriver always clones lhsp
|
|
|
|
|
VL_DO_DANGLING(lhsp->deleteTree(), lhsp);
|
2022-10-07 17:13:01 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public:
|
2025-07-26 21:37:01 +02:00
|
|
|
static void apply(DfgGraph& dfg, V3DfgDfgToAstContext& ctx) { DfgToAstVisitor{dfg, ctx}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
2025-07-26 21:37:01 +02:00
|
|
|
void V3DfgPasses::dfgToAst(DfgGraph& dfg, V3DfgContext& ctx) {
|
2025-07-01 23:55:08 +02:00
|
|
|
if (dfg.modulep()) {
|
2025-07-26 21:37:01 +02:00
|
|
|
DfgToAstVisitor</* T_Scoped: */ false>::apply(dfg, ctx.m_dfg2AstContext);
|
2025-07-01 23:55:08 +02:00
|
|
|
} else {
|
2025-07-26 21:37:01 +02:00
|
|
|
DfgToAstVisitor</* T_Scoped: */ true>::apply(dfg, ctx.m_dfg2AstContext);
|
2025-07-01 23:55:08 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|