verilator/src/V3Dfg.h

857 lines
34 KiB
C
Raw Normal View History

Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// -*- mode: C++; c-file-style: "cc-mode" -*-
//*************************************************************************
// DESCRIPTION: Verilator: Data flow graph (DFG) representation of logic
//
// Code available from: https://verilator.org
//
//*************************************************************************
//
// This program is free software; you can redistribute it and/or modify it
// under the terms of either the GNU Lesser General Public License Version 3
// or the Perl Artistic License Version 2.0.
// SPDX-FileCopyrightText: 2003-2026 Wilson Snyder
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
//
//*************************************************************************
//
// This is a data-flow graph based representation of combinational logic,
// the main difference from a V3Graph is that DfgVertex owns the storage
// of it's input edges (operands/sources/arguments), and can access each
// input edge directly by indexing, making modifications more efficient
// than the linked list based structures used by V3Graph.
//
// A bulk of the DfgVertex sub-types are generated by astgen, and are
// analogous to the corresponding AstNode sub-types.
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
//
// See also the internals documentation docs/internals.rst
//
//*************************************************************************
#ifndef VERILATOR_V3DFG_H_
#define VERILATOR_V3DFG_H_
#include "config_build.h"
#include "verilatedos.h"
#include "V3Ast.h"
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
#include "V3Cfg.h"
#include "V3DfgDataType.h"
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
#include "V3Error.h"
#include "V3Global.h"
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
#include "V3Hash.h"
#include "V3List.h"
#include "V3Dfg__gen_forward_class_decls.h" // From ./astgen
2022-09-30 17:19:53 +02:00
#include <algorithm>
#include <array>
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
#include <functional>
#include <new>
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
#include <type_traits>
#include <unordered_map>
#include <vector>
#ifndef VL_NOT_FINAL
#define VL_NOT_FINAL // This #define fixes broken code folding in the CLion IDE
#endif
// Can T be stored in the memory allocted for U?
template <typename T, typename U>
inline constexpr bool fitsSpaceAllocatedFor() {
return sizeof(T) <= sizeof(U) && alignof(T) <= alignof(U);
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
class DfgEdge;
2024-03-26 00:06:25 +01:00
class DfgVertex;
class DfgGraph;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
class DfgVisitor;
template <typename T_User, bool = fitsSpaceAllocatedFor<T_User, void*>()>
class DfgUserMap;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
//------------------------------------------------------------------------------
// Dataflow graph vertex type enum
class VDfgType final {
public:
#include "V3Dfg__gen_type_enum.h" // From ./astgen
const en m_e;
VDfgType() = delete;
// VDfgType is interconvetible with VDfgType::en
// cppcheck-suppress noExplicitConstructor
constexpr VDfgType(en _e)
: m_e{_e} {}
constexpr operator en() const { return m_e; }
};
constexpr bool operator==(VDfgType lhs, VDfgType rhs) { return lhs.m_e == rhs.m_e; }
constexpr bool operator==(VDfgType lhs, VDfgType::en rhs) { return lhs.m_e == rhs; }
constexpr bool operator==(VDfgType::en lhs, VDfgType rhs) { return lhs == rhs.m_e; }
inline std::ostream& operator<<(std::ostream& os, const VDfgType& t) { return os << t.ascii(); }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
//------------------------------------------------------------------------------
// Dataflow graph edge
class DfgEdge final {
friend class DfgVertex;
DfgVertex* m_srcp = nullptr; // The source vertex driving this edge - might be unconnected
DfgVertex* const m_dstp; // The vertex driven by this edge, which owns this edge, so immutable
V3ListLinks<DfgEdge> m_links; // V3List links in the list of sinks of m_srcp
DfgEdge() = delete;
VL_UNCOPYABLE(DfgEdge);
VL_UNMOVABLE(DfgEdge);
V3ListLinks<DfgEdge>& links() { return m_links; }
using List = V3List<DfgEdge, &DfgEdge::links>;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
public:
explicit DfgEdge(DfgVertex* dstp)
: m_dstp{dstp} {}
~DfgEdge() { unlinkSrcp(); }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// The source (driver) of this edge
DfgVertex* srcp() const { return m_srcp; }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// The sink (consumer) of this edge
DfgVertex* dstp() const { return m_dstp; }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Remove driver of this edge
inline void unlinkSrcp();
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Relink this edge to be driven from the given new source vertex
inline void relinkSrcp(DfgVertex* srcp);
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
};
//------------------------------------------------------------------------------
// Dataflow graph vertex
class DfgVertex VL_NOT_FINAL {
friend class DfgGraph;
friend class DfgEdge;
friend class DfgVisitor;
template <typename, bool>
friend class DfgUserMap;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// STATE
V3ListLinks<DfgVertex> m_links; // V3List links in the DfgGraph
std::vector<std::unique_ptr<DfgEdge>> m_inputps; // Input edges, as vector, for fast indexing
DfgEdge::List m_sinks; // List of sink edges of this vertex
2024-03-26 00:06:25 +01:00
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
FileLine* const m_filelinep; // Source location
const DfgDataType& m_dtype; // Data type of the result of this vertex
const VDfgType m_type; // Vertex type tag
// The only way to access thes is via DfgUserMap, so mutable is appropriate,
// the map can change while the keys (DfgVertex) are const.
mutable uint32_t m_userGeneration = 0; // User data generation number
mutable void* m_userStorage = nullptr; // User data storage - one pointer worth
#ifdef VL_DEBUG
DfgGraph* m_dfgp = nullptr; // Graph this vertex belongs to
#endif
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// METHODS
// Visitor accept method
virtual void accept(DfgVisitor& v) = 0;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Acessor for type List
2024-03-26 00:06:25 +01:00
V3ListLinks<DfgVertex>& links() { return m_links; }
public:
// List type that can store Vertex (which must be a DfgVertex) instances via m_links
template <typename Vertex>
using List = V3List<DfgVertex, &DfgVertex::links, Vertex>;
protected:
// CONSTRUCTOR
DfgVertex(DfgGraph& dfg, VDfgType type, FileLine* flp, const DfgDataType& dt) VL_MT_DISABLED;
// Use unlinkDelete instead
virtual ~DfgVertex() VL_MT_DISABLED = default;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Create a new input edge and return it
DfgEdge* newInput() {
m_inputps.emplace_back(new DfgEdge{this});
return m_inputps.back().get();
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
public:
// Get input 'i'
DfgVertex* inputp(size_t i) const { return m_inputps[i]->srcp(); }
// Relink input 'i'
void inputp(size_t i, DfgVertex* vtxp) { m_inputps[i]->relinkSrcp(vtxp); }
// The number of inputs this vertex has. Some might be unconnected.
size_t nInputs() const { return m_inputps.size(); }
// Unlink all inputs and reset to no inputs - use very carefully
void resetInputs() { m_inputps.clear(); }
// The type of this vertex
VDfgType type() const { return m_type; }
// Source location
FileLine* fileline() const { return m_filelinep; }
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
// The data type of the result of the vertex
const DfgDataType& dtype() const { return m_dtype; }
// Shorthands for accessors of 'dtype()'
bool isPacked() const { return m_dtype.isPacked(); }
bool isArray() const { return m_dtype.isArray(); }
uint32_t size() const { return m_dtype.size(); }
// Type check + size
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
uint32_t width() const {
UASSERT_OBJ(m_dtype.isPacked(), this, "Non packed vertex has no 'width'");
return m_dtype.size();
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
}
// Type check vertex (for debugging)
void typeCheck(const DfgGraph& dfg) const;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Predicate: has 1 or more sinks
bool hasSinks() const { return !m_sinks.empty(); }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Predicate: has 2 or more sinks
bool hasMultipleSinks() const { return m_sinks.hasMultipleElements(); }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Fanout (number of sinks) of this vertex (expensive to compute)
uint32_t fanout() const VL_MT_DISABLED;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Return a canonical variable vertex that holds the value of this vertex,
// or nullptr if no such variable exists in the graph. This is O(fanout).
DfgVertexVar* getResultVar() VL_MT_DISABLED;
2025-07-01 23:55:08 +02:00
// Cache type for 'scopep' below
using ScopeCache = std::unordered_map<const DfgVertex*, AstScope*>;
// Retrieve the prefred AstScope this vertex belongs to. For variable
// vertices this is defined. For operation vertices, we try to find a
// scope based on variables in the upstream logic cone (inputs). If
// there isn't one, (beceuse the whole upstream cone is constant...),
// then the root scope is returned. If 'tryResultVar' is true, we will
// condier the scope of 'getResultVar' first, if it exists.
// Only call this with a scoped DfgGraph
AstScope* scopep(ScopeCache& cache, bool tryResultVar = false) VL_MT_DISABLED;
// If the node has a single sink, return it, otherwise return nullptr
DfgVertex* singleSink() const {
return m_sinks.hasSingleElement() ? m_sinks.frontp()->dstp() : nullptr;
}
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
// First sink of the vertex, if any, otherwise nullptr
DfgVertex* firtsSinkp() { return m_sinks.empty() ? nullptr : m_sinks.frontp()->dstp(); }
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Unlink from container (graph or builder), then delete this vertex
void unlinkDelete(DfgGraph& dfg) VL_MT_DISABLED;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Relink all sinks to be driven from the given new source
void replaceWith(DfgVertex* vtxp) {
UASSERT_OBJ(vtxp != this, this, "Replacing DfgVertex with itself");
UASSERT_OBJ(vtxp->dtype() == dtype(), this, "Replacement DfgVertex has different type");
while (!m_sinks.empty()) m_sinks.frontp()->relinkSrcp(vtxp);
}
// Calls given function 'f' for each source vertex of this vertex. If 'f'
// returns true, further sources are not iterated and this method returns
// true itself. Unconnected source edges are not iterated.
bool foreachSource(std::function<bool(DfgVertex&)> f) {
for (const std::unique_ptr<DfgEdge>& edgep : m_inputps) {
if (DfgVertex* const srcp = edgep->srcp()) {
if (f(*srcp)) return true;
}
}
return false;
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Calls given function 'f' for each source vertex of this vertex. If 'f'
// returns true, further sources are not iterated and this method returns
// true itself. Unconnected source edges are not iterated.
bool foreachSource(std::function<bool(const DfgVertex&)> f) const {
for (const std::unique_ptr<DfgEdge>& edgep : m_inputps) {
if (DfgVertex* const srcp = edgep->srcp()) {
if (f(*srcp)) return true;
}
}
return false;
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Calls given function 'f' for each sink vertex of this vertex. If 'f'
// returns true, further sinks are not iterated and this method returns
// true itself. Unlinking/deleting the given sink during iteration is safe,
// but not other sinks of this vertex.
bool foreachSink(std::function<bool(DfgVertex&)> f) {
for (const DfgEdge* const edgep : m_sinks.unlinkable()) {
if (f(*edgep->dstp())) return true;
}
return false;
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Calls given function 'f' for each sink vertex of this vertex. If 'f'
// returns true, further sinks are not iterated and this method returns
// true itself.
bool foreachSink(std::function<bool(const DfgVertex&)> f) const {
for (const DfgEdge& edge : m_sinks) {
if (f(*edge.dstp())) return true;
}
return false;
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Is this vertex cheaper to re-compute than to load out of memoy
inline bool isCheaperThanLoad() const;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Methods that allow DfgVertex to participate in error reporting/messaging
// LCOV_EXCL_START
void v3errorEnd(std::ostringstream& str) const VL_RELEASE(V3Error::s().m_mutex) {
m_filelinep->v3errorEnd(str);
}
void v3errorEndFatal(std::ostringstream& str) const VL_ATTR_NORETURN
VL_RELEASE(V3Error::s().m_mutex) {
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
m_filelinep->v3errorEndFatal(str);
}
string warnContextPrimary() const VL_REQUIRES(V3Error::s().m_mutex) {
return fileline()->warnContextPrimary();
}
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
string warnContextSecondary() const { return fileline()->warnContextSecondary(); }
string warnMore() const VL_REQUIRES(V3Error::s().m_mutex) { return fileline()->warnMore(); }
string warnOther() const VL_REQUIRES(V3Error::s().m_mutex) { return fileline()->warnOther(); }
// LCOV_EXCL_STOP
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
private:
// For internal use only.
// Note: specializations for particular vertex types are provided by 'astgen'
template <typename T>
inline static bool privateTypeTest(const DfgVertex* nodep);
public:
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Subtype test
template <typename T>
bool is() const {
static_assert(std::is_base_of<DfgVertex, T>::value, "'T' must be a subtype of DfgVertex");
return privateTypeTest<typename std::remove_cv<T>::type>(this);
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
}
// Ensure subtype, then cast to that type
template <typename T>
T* as() {
UASSERT_OBJ(is<T>(), this,
"DfgVertex is not of expected type, but instead has type '" << typeName()
<< "'");
return static_cast<T*>(this);
}
template <typename T>
const T* as() const {
UASSERT_OBJ(is<T>(), this,
"DfgVertex is not of expected type, but instead has type '" << typeName()
<< "'");
return static_cast<const T*>(this);
}
// Cast to subtype, or null if different
template <typename T>
T* cast() {
return is<T>() ? static_cast<T*>(this) : nullptr;
}
template <typename T>
const T* cast() const {
return is<T>() ? static_cast<const T*>(this) : nullptr;
}
// Human-readable vertex type as string for debugging
std::string typeName() const { return m_type.ascii(); }
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
// Human-readable name for source operand with given index for debugging
virtual std::string srcName(size_t idx) const = 0;
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
};
// DfgVertex visitor
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
class DfgVisitor VL_NOT_FINAL {
public:
// Dispatch to most specific 'visit' method on 'vtxp'
void iterate(DfgVertex* vtxp) { vtxp->accept(*this); }
// Least specific visit method is abstract
virtual void visit(DfgVertex* nodep) = 0;
#include "V3Dfg__gen_visitor_decls.h" // From ./astgen
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
};
// DfgVertex subclasses
#include "V3DfgVertices.h"
// Specializations of privateTypeTest
#include "V3Dfg__gen_type_tests.h" // From ./astgen
2024-03-26 00:06:25 +01:00
//------------------------------------------------------------------------------
// Dataflow graph
class DfgGraph final {
friend class DfgUserMapBase;
2024-03-26 00:06:25 +01:00
// MEMBERS
// Variables and constants make up a significant proportion of vertices (40-50% was observed
// in large designs), and they can often be treated specially in algorithms, which in turn
// enables significant Verilation performance gains, so we keep these in separate lists for
// direct access.
DfgVertex::List<DfgVertexVar> m_varVertices; // The variable vertices in the graph
DfgVertex::List<DfgConst> m_constVertices; // The constant vertices in the graph
DfgVertex::List<DfgVertex> m_opVertices; // The operation vertices in the graph
size_t m_size = 0; // Number of vertices in the graph
2025-07-01 23:55:08 +02:00
// Parent of the graph (i.e.: the module containing the logic represented by this graph),
// or nullptr when run after V3Scope
2024-03-26 00:06:25 +01:00
AstModule* const m_modulep;
const std::string m_name; // Name of graph - need not be unique
std::string m_tmpNameStub{""}; // Name stub for temporary variables - computed lazy
2024-03-26 00:06:25 +01:00
// The only way to access thes is via DfgUserMap, so mutable is appropriate,
// the map can change while the graph is const.
mutable bool m_vertexUserInUse = false; // Vertex user data currently in use
mutable uint32_t m_vertexUserGeneration = 0; // Vertex user data generation counter
2024-03-26 00:06:25 +01:00
public:
// CONSTRUCTOR
2025-07-01 23:55:08 +02:00
explicit DfgGraph(AstModule* modulep, const string& name = "") VL_MT_DISABLED;
2024-03-26 00:06:25 +01:00
~DfgGraph() VL_MT_DISABLED;
VL_UNCOPYABLE(DfgGraph);
// METHODS
public:
// Number of vertices in this graph
size_t size() const { return m_size; }
2025-07-01 23:55:08 +02:00
// Parent module - or nullptr when run after V3Scope
2024-03-26 00:06:25 +01:00
AstModule* modulep() const { return m_modulep; }
// Name of this graph
const string& name() const { return m_name; }
// Create a new DfgUserMap
template <typename T_User>
inline DfgUserMap<T_User> makeUserMap() const;
2024-03-26 00:06:25 +01:00
// Access to vertex lists
DfgVertex::List<DfgVertexVar>& varVertices() { return m_varVertices; }
const DfgVertex::List<DfgVertexVar>& varVertices() const { return m_varVertices; }
DfgVertex::List<DfgConst>& constVertices() { return m_constVertices; }
const DfgVertex::List<DfgConst>& constVertices() const { return m_constVertices; }
DfgVertex::List<DfgVertex>& opVertices() { return m_opVertices; }
const DfgVertex::List<DfgVertex>& opVertices() const { return m_opVertices; }
// Add DfgVertex to this graph (assumes not yet contained).
void addVertex(DfgVertex& vtx) {
#ifdef VL_DEBUG
UASSERT_OBJ(!vtx.m_dfgp, &vtx, "Vertex already in a graph");
#endif
// Note: changes here need to be replicated in DfgGraph::mergeGraphs
++m_size;
if (DfgConst* const cVtxp = vtx.cast<DfgConst>()) {
m_constVertices.linkBack(cVtxp);
} else if (DfgVertexVar* const vVtxp = vtx.cast<DfgVertexVar>()) {
m_varVertices.linkBack(vVtxp);
} else {
m_opVertices.linkBack(&vtx);
}
vtx.m_userGeneration = 0;
#ifdef VL_DEBUG
vtx.m_dfgp = this;
#endif
}
// Remove DfgVertex form this graph (assumes it is contained).
void removeVertex(DfgVertex& vtx) {
#ifdef VL_DEBUG
UASSERT_OBJ(vtx.m_dfgp == this, &vtx, "Vertex not in this graph");
#endif
// Note: changes here need to be replicated in DfgGraph::mergeGraphs
--m_size;
if (DfgConst* const cVtxp = vtx.cast<DfgConst>()) {
m_constVertices.unlink(cVtxp);
} else if (DfgVertexVar* const vVtxp = vtx.cast<DfgVertexVar>()) {
m_varVertices.unlink(vVtxp);
} else {
m_opVertices.unlink(&vtx);
}
vtx.m_userGeneration = 0;
#ifdef VL_DEBUG
vtx.m_dfgp = nullptr;
#endif
}
2024-03-26 00:06:25 +01:00
// Calls given function 'f' for each vertex in the graph. It is safe to manipulate any vertices
// in the graph, or to delete/unlink the vertex passed to 'f' during iteration. It is however
// not safe to delete/unlink any vertex in the same graph other than the one passed to 'f'.
void forEachVertex(std::function<void(DfgVertex&)> f) {
for (DfgVertexVar* const vtxp : m_varVertices.unlinkable()) f(*vtxp);
for (DfgConst* const vtxp : m_constVertices.unlinkable()) f(*vtxp);
for (DfgVertex* const vtxp : m_opVertices.unlinkable()) f(*vtxp);
}
2024-03-26 00:06:25 +01:00
// 'const' variant of 'forEachVertex'. No mutation allowed.
void forEachVertex(std::function<void(const DfgVertex&)> f) const {
for (const DfgVertexVar& vtx : m_varVertices) f(vtx);
for (const DfgConst& vtx : m_constVertices) f(vtx);
for (const DfgVertex& vtx : m_opVertices) f(vtx);
}
2024-03-26 00:06:25 +01:00
// Return an identical, independent copy of this graph. Vertex and edge order might differ.
std::unique_ptr<DfgGraph> clone() const VL_MT_DISABLED;
// Merge contents of other graphs into this graph. Deletes the other graphs.
// DfgVertexVar instances representing the same Ast variable are unified.
void mergeGraphs(std::vector<std::unique_ptr<DfgGraph>>&& otherps) VL_MT_DISABLED;
2024-03-26 00:06:25 +01:00
// Genarete a unique name. The provided 'prefix' and 'n' values will be part of the name, and
// must be unique (as a pair) in each invocation for this graph.
std::string makeUniqueName(const std::string& prefix, size_t n) VL_MT_DISABLED;
2025-07-01 23:55:08 +02:00
// Create a new variable with the given name and data type. For a Scoped
// Dfg, the AstScope where the corresponding AstVarScope will be inserted
// must be provided
DfgVertexVar* makeNewVar(FileLine*, const std::string& name, const DfgDataType&,
2025-07-01 23:55:08 +02:00
AstScope*) VL_MT_DISABLED;
2024-03-26 00:06:25 +01:00
// Split this graph into individual components (unique sub-graphs with no edges between them).
// Also removes any vertices that are not weakly connected to any variable.
// Leaves 'this' graph empty.
std::vector<std::unique_ptr<DfgGraph>>
splitIntoComponents(const std::string& label) VL_MT_DISABLED;
2024-03-26 00:06:25 +01:00
// Extract cyclic sub-graphs from 'this' graph. Cyclic sub-graphs are those that contain at
// least one strongly connected component (SCC) plus any other vertices that feed or sink from
// the SCCs, up to a variable boundary. This means that the returned graphs are guaranteed to
// be cyclic, but they are not guaranteed to be strongly connected (however, they are always
// at least weakly connected). Trivial SCCs that are acyclic (i.e.: vertices that are not part
// of a cycle) are left in 'this' graph. This means that at the end 'this' graph is guaranteed
// to be a DAG (acyclic). 'this' will not necessarily be a connected graph at the end, even if
// it was originally connected.
std::vector<std::unique_ptr<DfgGraph>>
extractCyclicComponents(const std::string& label) VL_MT_DISABLED;
2024-03-26 00:06:25 +01:00
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
//-----------------------------------------------------------------------
// Debug dumping
2024-03-26 00:06:25 +01:00
// Dump graph in Graphviz format into the given stream 'os'. 'label' is added to the name of
// the graph which is included in the output.
// If the predicate function 'p' is provided, only those vertices are dumped that satifty it.
void dumpDot(std::ostream& os, const std::string& label,
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
// Dump graph in Graphviz format into a new file with the given 'filename'. 'label' is added to
2024-03-26 00:06:25 +01:00
// the name of the graph which is included in the output.
// If the predicate function 'p' is provided, only those vertices are dumped that satifty it.
void dumpDotFile(const std::string& filename, const std::string& label,
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
// Same as dumpDotFile, but returns the contents as a string.
std::string dumpDotString(const std::string& label,
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
2024-03-26 00:06:25 +01:00
// Dump graph in Graphviz format into a new automatically numbered debug file. 'label' is
// added to the name of the graph, which is included in the file name and the output.
// If the predicate function 'p' is provided, only those vertices are dumped that satifty it.
void dumpDotFilePrefixed(const std::string& label,
std::function<bool(const DfgVertex&)> p = {}) const VL_MT_DISABLED;
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
// Returns the set of vertices in the upstream cones of the given vertices
std::unique_ptr<std::unordered_set<const DfgVertex*>>
sourceCone(const std::vector<const DfgVertex*>&) const VL_MT_DISABLED;
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
// Returns the set of vertices in the downstream cones of the given vertices
std::unique_ptr<std::unordered_set<const DfgVertex*>>
sinkCone(const std::vector<const DfgVertex*>&) const VL_MT_DISABLED;
};
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
namespace V3Dfg {
//-----------------------------------------------------------------------
// Functions for compatibility tests
// Returns true if variable can be represented in the graph
inline bool isSupported(const AstVar* varp) {
if (varp->isIfaceRef()) return false; // Cannot handle interface references
if (varp->delayp()) return false; // Cannot handle delayed variables
if (varp->isSc()) return false; // SystemC variables are special and rare, we can ignore
if (varp->dfgMultidriven()) return false; // Discovered as multidriven on earlier DFG run
if (DfgVertexVar::hasRWRefs(varp)) return false; // Referenced via READWRITE references
return DfgDataType::fromAst(varp->dtypep());
}
// Returns true if variable can be represented in the graph
inline bool isSupported(const AstVarScope* vscp) {
const AstNodeModule* const modp = vscp->scopep()->modp();
if (VN_IS(modp, Module)) {
// Regular module supported
} else if (const AstIface* const ifacep = VN_CAST(modp, Iface)) {
// Interfaces supported if there are no virtual interfaces for
// them, otherwise they cannot be resovled statically.
if (ifacep->hasVirtualRef()) return false;
} else {
return false; // Anything else (package, class, etc) not supported
}
if (DfgVertexVar::hasRWRefs(vscp)) return false; // Referenced via READWRITE references
// Check the AstVar
return isSupported(vscp->varp());
}
} //namespace V3Dfg
//------------------------------------------------------------------------------
// Map from DfgVertices to T_Value implemeneted via DfgVertex::m_userStorage
// Base class with common behavour
class DfgUserMapBase VL_NOT_FINAL {
template <typename, bool>
friend class DfgUserMap;
protected:
// STATE
const DfgGraph* m_dfgp; // The graph this map is for
// The current generation number
const uint32_t m_currentGeneration;
// CONSTRUCTOR
explicit DfgUserMapBase(const DfgGraph* dfgp)
: m_dfgp{dfgp}
, m_currentGeneration{++m_dfgp->m_vertexUserGeneration} {
2025-12-29 03:30:16 +01:00
UASSERT(m_currentGeneration, "DfgGraph user data generation number overflow");
UASSERT(!m_dfgp->m_vertexUserInUse, "DfgUserMap already in use for this DfgGraph");
m_dfgp->m_vertexUserInUse = true;
}
VL_UNCOPYABLE(DfgUserMapBase);
DfgUserMapBase(DfgUserMapBase&& that)
: m_dfgp{that.m_dfgp}
, m_currentGeneration{that.m_currentGeneration} {
that.m_dfgp = nullptr;
}
DfgUserMapBase& operator=(DfgUserMapBase&&) = delete;
public:
~DfgUserMapBase() {
if (m_dfgp) m_dfgp->m_vertexUserInUse = false;
}
};
// Specialization where T_Value fits in DfgVertex::m_userStorage directly
template <typename T_Value>
class DfgUserMap<T_Value, true> final : public DfgUserMapBase {
static_assert(fitsSpaceAllocatedFor<T_Value, decltype(DfgVertex::m_userStorage)>(),
"'T_Value' does not fit 'DfgVertex::m_userStorage'");
friend class DfgGraph;
// CONSTRUCTOR
explicit DfgUserMap(const DfgGraph* dfgp)
: DfgUserMapBase{dfgp} {}
VL_UNCOPYABLE(DfgUserMap);
DfgUserMap& operator=(DfgUserMap&&) = delete;
public:
DfgUserMap(DfgUserMap&&) = default;
~DfgUserMap() = default;
// METHODS
// Retrieve mapped value for 'vtx', value initializing it on first access
T_Value& operator[](const DfgVertex& vtx) {
#ifdef VL_DEBUG
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
#endif
T_Value* const storagep = reinterpret_cast<T_Value*>(&vtx.m_userStorage);
if (vtx.m_userGeneration != m_currentGeneration) {
new (storagep) T_Value{};
vtx.m_userGeneration = m_currentGeneration;
}
return *storagep;
}
// Same as above with pointer as key
T_Value& operator[](const DfgVertex* vtxp) { return (*this)[*vtxp]; }
// Retrieve mapped value of 'vtx', must be alerady present
T_Value& at(const DfgVertex& vtx) const {
#ifdef VL_DEBUG
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
#endif
UASSERT_OBJ(vtx.m_userGeneration == m_currentGeneration, &vtx, "Vertex not in map");
T_Value* const storagep = reinterpret_cast<T_Value*>(&vtx.m_userStorage);
return *storagep;
}
// Same as above with pointer as key
T_Value& at(const DfgVertex* vtxp) const { return (*this).at(*vtxp); }
};
// Specialization where T_Value does not fit in DfgVertex::m_userStorage directly
template <typename T_Value>
class DfgUserMap<T_Value, false> final : public DfgUserMapBase {
static_assert(fitsSpaceAllocatedFor<T_Value*, decltype(DfgVertex::m_userStorage)>(),
"'T_Value*' does not fit 'DfgVertex::m_userStorage'");
friend class DfgGraph;
// STATE
std::deque<T_Value> m_storage; // Storage for T_Value instances
// CONSTRUCTOR
explicit DfgUserMap(const DfgGraph* dfgp)
: DfgUserMapBase{dfgp} {}
VL_UNCOPYABLE(DfgUserMap);
DfgUserMap& operator=(DfgUserMap&&) = delete;
public:
DfgUserMap(DfgUserMap&&) = default;
~DfgUserMap() = default;
// METHODS
// Retrieve mapped value for 'vtx', value initializing it on first access
T_Value& operator[](const DfgVertex& vtx) {
#ifdef VL_DEBUG
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
#endif
T_Value*& storagepr = reinterpret_cast<T_Value*&>(vtx.m_userStorage);
if (vtx.m_userGeneration != m_currentGeneration) {
m_storage.emplace_back();
storagepr = &m_storage.back();
vtx.m_userGeneration = m_currentGeneration;
}
return *storagepr;
}
// Same as above with pointer as key
T_Value& operator[](const DfgVertex* vtxp) { return (*this)[*vtxp]; }
// Retrieve mapped value of 'vtx', must be alerady present
T_Value& at(const DfgVertex& vtx) const {
#ifdef VL_DEBUG
UASSERT_OBJ(vtx.m_dfgp == m_dfgp, &vtx, "Vertex not in this graph");
#endif
UASSERT_OBJ(vtx.m_userGeneration == m_currentGeneration, &vtx, "Vertex not in map");
return *reinterpret_cast<T_Value*&>(vtx.m_userStorage);
}
// Same as above with pointer as key
T_Value& at(const DfgVertex* vtxp) const { return (*this).at(*vtxp); }
};
//------------------------------------------------------------------------------
// Worklist for processing DfgVertices, implemented via DfgUserMap
class DfgWorklist final {
// STATE
// The Graph being processed
DfgGraph& m_dfg;
// Map from vertex to next vertex in the work list
DfgUserMap<DfgVertex*> m_nextp = m_dfg.makeUserMap<DfgVertex*>();
// We want all 'nextp' pointers for vertices that are in the worklist to be
// non-zero (including that of the last element). This allows us to do two
// important things: detect if an element is in the list by checking for a
// non-zero 'nextp'', and easy prefetching without conditionals. The
// address of the worklist itself is a good sentinel as it is a valid
// memory address, and we can easily check for the end of the list.
DfgVertex* const m_sentinelp = reinterpret_cast<DfgVertex*>(this);
// Head of work list
DfgVertex* m_headp = m_sentinelp;
public:
// CONSTRUCTOR
explicit DfgWorklist(DfgGraph& dfg)
: m_dfg{dfg} {}
VL_UNCOPYABLE(DfgWorklist);
VL_UNMOVABLE(DfgWorklist);
~DfgWorklist() = default;
// METHODS
// If 'vtx' is not in the worklist already, add it at the head of the list
// and return ture. If 'vtx' is already in the work list, then do nothing
// and return false.
bool push_front(DfgVertex& vtx) {
// Pick up reference to the next pointer
DfgVertex*& nextpr = m_nextp[vtx];
// If already in work list then nothing to do
if (nextpr) return false;
// Prepend to work list
nextpr = m_headp;
m_headp = &vtx;
return true;
}
// Returns ture iff 'vtx' is in the worklist
bool contains(const DfgVertex& vtx) { return m_nextp[vtx]; }
// Process the worklist by removing the first element, calling on it the
// given callable 'f', and repeat until the worklist is empty. The callable
// 'f' can add furthere vertices to the worklist.
template <typename T_Callable>
void foreach(T_Callable&& f) {
static_assert(vlstd::is_invocable_r<void, T_Callable, DfgVertex&>::value,
"T_Callable 'f' must have a signature compatible with 'void(DfgVertex&)'");
// Process the work list
while (m_headp != m_sentinelp) {
// Pick up the head
DfgVertex& vtx = *m_headp;
// Detach the head
m_headp = m_nextp.at(vtx);
// Prefetch next item
VL_PREFETCH_RW(m_headp);
// This item is now off the work list
m_nextp.at(vtx) = nullptr;
// Apply 'f'
f(vtx);
}
}
};
//------------------------------------------------------------------------------
// Inline method definitions
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
// DfgEdge {{{
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
void DfgEdge::unlinkSrcp() {
if (!m_srcp) return;
#ifdef VL_DEBUG
bool contained = false;
for (const DfgEdge& edge : m_srcp->m_sinks) {
if (&edge != this) continue;
contained = true;
break;
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
}
UASSERT_OBJ(contained, m_srcp, "'m_srcp' does not have this as sink");
#endif
m_srcp->m_sinks.unlink(this);
m_srcp = nullptr;
}
Optimize complex combinational logic in DFG (#6298) This patch adds DfgLogic, which is a vertex that represents a whole, arbitrarily complex combinational AstAlways or AstAssignW in the DfgGraph. Implementing this requires computing the variables live at entry to the AstAlways (variables read by the block), so there is a new ControlFlowGraph data structure and a classical data-flow analysis based live variable analysis to do that at the variable level (as opposed to bit/element level). The actual CFG construction and live variable analysis is best effort, and might fail for currently unhandled constructs or data types. This can be extended later. V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices. The DfgLogic are then subsequently synthesized into primitive operations by the new V3DfgSynthesize pass, which is a combination of the old V3DfgAstToDfg conversion and new code to handle AstAlways blocks with complex flow control. V3DfgSynthesize by default will synthesize roughly the same constructs as V3DfgAstToDfg used to handle before, plus any logic that is part of a combinational cycle within the DfgGraph. This enables breaking up these cycles, for which there are extensions to V3DfgBreakCycles in this patch as well. V3DfgSynthesize will then delete all non synthesized or non synthesizable DfgLogic vertices and the rest of the Dfg pipeline is identical, with minor changes to adjust for the changed representation. Because with this change we can now eliminate many more UNOPTFLAT, DFG has been disabled in all the tests that specifically target testing the scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
void DfgEdge::relinkSrcp(DfgVertex* srcp) {
// Unlink current source, if any
unlinkSrcp();
m_srcp = srcp;
if (m_srcp) m_srcp->m_sinks.linkFront(this);
}
2024-03-26 00:06:25 +01:00
// }}}
2024-03-26 00:06:25 +01:00
// DfgVertex {{{
bool DfgVertex::isCheaperThanLoad() const {
// Array sels are just address computation
if (is<DfgArraySel>()) return true;
// Small constant select from variable
if (const DfgSel* const selp = cast<DfgSel>()) {
if (!selp->fromp()->is<DfgVarPacked>()) return false;
if (selp->fromp()->width() <= VL_QUADSIZE) return true;
const uint32_t lsb = selp->lsb();
const uint32_t msb = lsb + selp->width() - 1;
return VL_BITWORD_E(msb) == VL_BITWORD_E(lsb);
}
// Zero extend of a cheap vertex - Extend(_) was converted to Concat(0, _)
if (const DfgConcat* const catp = cast<DfgConcat>()) {
if (catp->width() > VL_QUADSIZE) return false;
const DfgConst* const lCatp = catp->lhsp()->cast<DfgConst>();
if (!lCatp) return false;
if (!lCatp->isZero()) return false;
return catp->rhsp()->isCheaperThanLoad();
}
// Reduction of a cheap vertex
if (const DfgRedOr* const redOrp = cast<DfgRedOr>()) {
return redOrp->srcp()->isCheaperThanLoad();
}
if (const DfgRedAnd* const redAndp = cast<DfgRedAnd>()) {
return redAndp->srcp()->isCheaperThanLoad();
}
if (const DfgRedXor* const redXorp = cast<DfgRedXor>()) {
return redXorp->srcp()->isCheaperThanLoad();
}
// Otherwise probably not
return false;
}
// }}}
// DfgGraph {{{
2024-03-26 00:06:25 +01:00
template <typename T_User>
DfgUserMap<T_User> DfgGraph::makeUserMap() const {
return DfgUserMap<T_User>{this};
}
2024-03-26 00:06:25 +01:00
// }}}
2024-03-26 00:06:25 +01:00
Introduce DFG based combinational logic optimizer (#3527) Added a new data-flow graph (DFG) based combinational logic optimizer. The capabilities of this covers a combination of V3Const and V3Gate, but is also more capable of transforming combinational logic into simplified forms and more. This entail adding a new internal representation, `DfgGraph`, and appropriate `astToDfg` and `dfgToAst` conversion functions. The graph represents some of the combinational equations (~continuous assignments) in a module, and for the duration of the DFG passes, it takes over the role of AstModule. A bulk of the Dfg vertices represent expressions. These vertex classes, and the corresponding conversions to/from AST are mostly auto-generated by astgen, together with a DfgVVisitor that can be used for dynamic dispatch based on vertex (operation) types. The resulting combinational logic graph (a `DfgGraph`) is then optimized in various ways. Currently we perform common sub-expression elimination, variable inlining, and some specific peephole optimizations, but there is scope for more optimizations in the future using the same representation. The optimizer is run directly before and after inlining. The pre inline pass can operate on smaller graphs and hence converges faster, but still has a chance of substantially reducing the size of the logic on some designs, making inlining both faster and less memory intensive. The post inline pass can then optimize across the inlined module boundaries. No optimization is performed across a module boundary. For debugging purposes, each peephole optimization can be disabled individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one of the optimizations listed in V3DfgPeephole.h, for example -fno-dfg-peephole-remove-not-not. The peephole patterns currently implemented were mostly picked based on the design that inspired this work, and on that design the optimizations yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As you can imagine not having to haul around redundant combinational networks in the rest of the compilation pipeline also helps with memory consumption, and up to 30% peak memory usage of Verilator was observed on the same design. Gains on other arbitrary designs are smaller (and can be improved by analyzing those designs). For example OpenTitan gains between 1-15% speedup depending on build type.
2022-09-23 17:46:22 +02:00
#endif