Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Data flow graph (DFG) representation of logic
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// Copyright 2003-2022 by Wilson Snyder. This program is free software; you
|
|
|
|
|
// can redistribute it and/or modify it under the terms of either the GNU
|
|
|
|
|
// Lesser General Public License Version 3 or the Perl Artistic License
|
|
|
|
|
// Version 2.0.
|
|
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// This is a data-flow graph based representation of combinational logic,
|
|
|
|
|
// the main difference from a V3Graph is that DfgVertex owns the storage
|
|
|
|
|
// of it's input edges (operands/sources/arguments), and can access each
|
|
|
|
|
// input edge directly by indexing, making modifications more efficient
|
|
|
|
|
// than the linked list based structures used by V3Graph.
|
|
|
|
|
//
|
|
|
|
|
// A bulk of the DfgVertex sub-types are generated by astgen, and are
|
|
|
|
|
// analogous to the correspondign AstNode sub-types.
|
|
|
|
|
//
|
|
|
|
|
// See also the internals documentation docs/internals.rst
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
|
|
|
|
#ifndef VERILATOR_V3DFG_H_
|
|
|
|
|
#define VERILATOR_V3DFG_H_
|
|
|
|
|
|
|
|
|
|
#include "config_build.h"
|
|
|
|
|
#include "verilatedos.h"
|
|
|
|
|
|
|
|
|
|
#include "V3Ast.h"
|
|
|
|
|
#include "V3Error.h"
|
|
|
|
|
#include "V3Hash.h"
|
|
|
|
|
#include "V3Hasher.h"
|
|
|
|
|
#include "V3List.h"
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
#include <algorithm>
|
2022-09-25 17:03:15 +02:00
|
|
|
#include <array>
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include <functional>
|
|
|
|
|
#include <type_traits>
|
|
|
|
|
#include <unordered_map>
|
|
|
|
|
#include <vector>
|
|
|
|
|
|
|
|
|
|
class DfgVertex;
|
|
|
|
|
class DfgEdge;
|
|
|
|
|
class DfgVisitor;
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
// Specialization of std::hash for a std::pair<const DfgVertex*, const DfgVertex*> for use below
|
|
|
|
|
template <>
|
|
|
|
|
struct std::hash<std::pair<const DfgVertex*, const DfgVertex*>> final {
|
|
|
|
|
size_t operator()(const std::pair<const DfgVertex*, const DfgVertex*>& item) const {
|
|
|
|
|
const size_t a = reinterpret_cast<std::uintptr_t>(item.first);
|
|
|
|
|
const size_t b = reinterpret_cast<std::uintptr_t>(item.second);
|
|
|
|
|
constexpr size_t halfWidth = 8 * sizeof(b) / 2;
|
|
|
|
|
return a ^ ((b << halfWidth) | (b >> halfWidth));
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dataflow graph
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
class DfgGraph final {
|
|
|
|
|
friend class DfgVertex;
|
|
|
|
|
|
|
|
|
|
// MEMBERS
|
|
|
|
|
size_t m_size = 0; // Number of vertices in the graph
|
|
|
|
|
V3List<DfgVertex*> m_vertices; // The vertices in the graph
|
|
|
|
|
// Parent of the graph (i.e.: the module containing the logic represented by this graph).
|
|
|
|
|
AstModule* const m_modulep;
|
|
|
|
|
const string m_name; // Name of graph (for debugging)
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
explicit DfgGraph(AstModule& module, const string& name = "");
|
|
|
|
|
~DfgGraph();
|
|
|
|
|
VL_UNCOPYABLE(DfgGraph);
|
|
|
|
|
|
|
|
|
|
// METHODS
|
2022-09-28 15:42:18 +02:00
|
|
|
public:
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Add DfgVertex to this graph (assumes not yet contained).
|
|
|
|
|
inline void addVertex(DfgVertex& vtx);
|
|
|
|
|
// Remove DfgVertex form this graph (assumes it is contained).
|
|
|
|
|
inline void removeVertex(DfgVertex& vtx);
|
|
|
|
|
// Number of vertices in this graph
|
|
|
|
|
size_t size() const { return m_size; }
|
|
|
|
|
// Parent module
|
|
|
|
|
AstModule* modulep() const { return m_modulep; }
|
|
|
|
|
// Name of this graph
|
|
|
|
|
const string& name() const { return m_name; }
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each vertex in the graph. It is safe to manipulate any vertices
|
|
|
|
|
// in the graph, or to delete/unlink the vertex passed to 'f' during iteration. It is however
|
|
|
|
|
// not safe to delete/unlink any vertex in the same graph other than the one passed to 'f'.
|
|
|
|
|
inline void forEachVertex(std::function<void(DfgVertex&)> f);
|
|
|
|
|
|
|
|
|
|
// 'const' variant of 'forEachVertex'. No mutation allowed.
|
|
|
|
|
inline void forEachVertex(std::function<void(const DfgVertex&)> f) const;
|
|
|
|
|
|
|
|
|
|
// Same as 'forEachVertex' but iterates in reverse order.
|
|
|
|
|
inline void forEachVertexInReverse(std::function<void(DfgVertex&)> f);
|
|
|
|
|
|
|
|
|
|
// Returns first vertex of type 'Vertex' that satisfies the given predicate 'p',
|
|
|
|
|
// or nullptr if no such vertex exists in the graph.
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
inline Vertex* findVertex(std::function<bool(const Vertex&)> p) const;
|
|
|
|
|
|
|
|
|
|
// Add contents of other graph to this graph. Leaves other graph empty.
|
|
|
|
|
void addGraph(DfgGraph& other);
|
|
|
|
|
|
|
|
|
|
// Topologically sort the list of vertices in this graph (such that 'forEachVertex' will
|
|
|
|
|
// iterate in topological order), or reverse topologically if the passed boolean argument is
|
|
|
|
|
// true. Returns true on success (the graph is acyclic and a topological order exists), false
|
|
|
|
|
// if the graph is cyclic. If the graph is cyclic, the vertex ordering is not modified.
|
|
|
|
|
bool sortTopologically(bool reverse = false);
|
|
|
|
|
|
|
|
|
|
// Split this graph into individual components (unique sub-graphs with no edges between them).
|
|
|
|
|
// Leaves 'this' graph empty.
|
2022-09-28 15:42:18 +02:00
|
|
|
std::vector<std::unique_ptr<DfgGraph>> splitIntoComponents(std::string label);
|
|
|
|
|
|
|
|
|
|
// Extract cyclic sub-graphs from 'this' graph. Cyclic sub-graphs are those that contain at
|
|
|
|
|
// least one strongly connected component (SCC) plus any other vertices that feed or sink from
|
|
|
|
|
// the SCCs, up to a variable boundary. This means that the returned graphs are guaranteed to
|
|
|
|
|
// be cyclic, but they are not guaranteed to be strongly connected (however, they are always
|
|
|
|
|
// at least weakly connected). Trivial SCCs that are acyclic (i.e.: vertices that are not part
|
|
|
|
|
// of a cycle) are left in 'this' graph. This means that at the end 'this' graph is guaranteed
|
|
|
|
|
// to be a DAG (acyclic). 'this' will not necessarily be a connected graph at the end, even if
|
|
|
|
|
// it was originally connected.
|
|
|
|
|
std::vector<std::unique_ptr<DfgGraph>> extractCyclicComponents(std::string label);
|
|
|
|
|
|
|
|
|
|
// Apply the given function to all vertices in the graph. The function return value
|
|
|
|
|
// indicates that a change has been made to the graph. Repeat until no changes reported.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void runToFixedPoint(std::function<bool(DfgVertex&)> f);
|
|
|
|
|
|
|
|
|
|
// Dump graph in Graphviz format into the given stream 'os'. 'label' is added to the name of
|
|
|
|
|
// the graph which is included in the output.
|
|
|
|
|
void dumpDot(std::ostream& os, const string& label = "") const;
|
|
|
|
|
// Dump graph in Graphviz format into a new file with the given 'fileName'. 'label' is added to
|
|
|
|
|
// the name of the graph which is included in the output.
|
|
|
|
|
void dumpDotFile(const string& fileName, const string& label = "") const;
|
|
|
|
|
// Dump graph in Graphviz format into a new automatically numbered debug file. 'label' is
|
|
|
|
|
// added to the name of the graph, which is included in the file name and the output.
|
|
|
|
|
void dumpDotFilePrefixed(const string& label = "") const;
|
|
|
|
|
// Dump upstream (source) logic cone starting from given vertex into a file with the given
|
|
|
|
|
// 'fileName'. 'name' is the name of the graph, which is included in the output.
|
|
|
|
|
void dumpDotUpstreamCone(const string& fileName, const DfgVertex& vtx,
|
|
|
|
|
const string& name = "") const;
|
|
|
|
|
// Dump all individual logic cones driving external variables in Graphviz format into separate
|
|
|
|
|
// new automatically numbered debug files. 'label' is added to the name of the graph, which is
|
|
|
|
|
// included in the file names and the output. This is useful for very large graphs that are
|
|
|
|
|
// otherwise difficult to browse visually due to their size.
|
|
|
|
|
void dumpDotAllVarConesPrefixed(const string& label = "") const;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dataflow graph edge
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
class DfgEdge final {
|
|
|
|
|
friend class DfgVertex;
|
|
|
|
|
|
|
|
|
|
DfgEdge* m_nextp = nullptr; // Next edge in sink list
|
|
|
|
|
DfgEdge* m_prevp = nullptr; // Previous edge in sink list
|
|
|
|
|
DfgVertex* m_sourcep = nullptr; // The source vertex driving this edge
|
2022-09-25 17:03:15 +02:00
|
|
|
// Note that the sink vertex owns the edge, so it is immutable, but because we want to be able
|
|
|
|
|
// to allocate these as arrays, we use a default constructor + 'init' method to set m_sinkp.
|
|
|
|
|
DfgVertex* const m_sinkp = nullptr; // The sink vertex
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
public:
|
2022-09-25 17:03:15 +02:00
|
|
|
DfgEdge() {}
|
|
|
|
|
void init(DfgVertex* sinkp) { const_cast<DfgVertex*&>(m_sinkp) = sinkp; }
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// The source (driver) of this edge
|
|
|
|
|
DfgVertex* sourcep() const { return m_sourcep; }
|
|
|
|
|
// The sink (consumer) of this edge
|
|
|
|
|
DfgVertex* sinkp() const { return m_sinkp; }
|
|
|
|
|
// Remove driver of this edge
|
|
|
|
|
void unlinkSource();
|
|
|
|
|
// Relink this edge to be driven from the given new source vertex
|
|
|
|
|
void relinkSource(DfgVertex* newSourcep);
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dataflow graph vertex
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
// Reuse the generated type constants
|
|
|
|
|
using DfgType = VNType;
|
|
|
|
|
|
|
|
|
|
// Base data flow graph vertex
|
|
|
|
|
class DfgVertex VL_NOT_FINAL {
|
|
|
|
|
friend class DfgGraph;
|
|
|
|
|
friend class DfgEdge;
|
|
|
|
|
friend class DfgVisitor;
|
|
|
|
|
|
|
|
|
|
// STATE
|
|
|
|
|
V3ListEnt<DfgVertex*> m_verticesEnt; // V3List handle of this vertex, kept under the DfgGraph
|
|
|
|
|
protected:
|
|
|
|
|
DfgEdge* m_sinksp = nullptr; // List of sinks of this vertex
|
|
|
|
|
FileLine* const m_filelinep; // Source location
|
|
|
|
|
AstNodeDType* m_dtypep = nullptr; // Data type of the result of this vertex
|
|
|
|
|
const DfgType m_type;
|
|
|
|
|
|
|
|
|
|
// CONSTRUCTOR
|
|
|
|
|
DfgVertex(DfgGraph& dfg, FileLine* flp, AstNodeDType* dtypep, DfgType type);
|
|
|
|
|
|
|
|
|
|
public:
|
2022-09-27 01:06:50 +02:00
|
|
|
virtual ~DfgVertex();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
private:
|
|
|
|
|
// Visitor accept method
|
|
|
|
|
virtual void accept(DfgVisitor& v) = 0;
|
|
|
|
|
|
|
|
|
|
// Part of Vertex equality only dependent on this vertex
|
|
|
|
|
virtual bool selfEquals(const DfgVertex& that) const;
|
|
|
|
|
|
|
|
|
|
// Part of Vertex hash only dependent on this vertex
|
|
|
|
|
virtual V3Hash selfHash() const;
|
|
|
|
|
|
|
|
|
|
public:
|
2022-09-27 01:06:50 +02:00
|
|
|
// Supported packed types
|
|
|
|
|
static bool isSupportedPackedDType(const AstNodeDType* dtypep) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
dtypep = dtypep->skipRefp();
|
|
|
|
|
if (const AstBasicDType* const typep = VN_CAST(dtypep, BasicDType)) {
|
|
|
|
|
return typep->keyword().isIntNumeric();
|
|
|
|
|
}
|
|
|
|
|
if (const AstPackArrayDType* const typep = VN_CAST(dtypep, PackArrayDType)) {
|
2022-09-27 01:06:50 +02:00
|
|
|
return isSupportedPackedDType(typep->subDTypep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-09-26 19:31:12 +02:00
|
|
|
if (const AstNodeUOrStructDType* const typep = VN_CAST(dtypep, NodeUOrStructDType)) {
|
|
|
|
|
return typep->packed();
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// Returns true if an AstNode with the given 'dtype' can be represented as a DfgVertex
|
|
|
|
|
static bool isSupportedDType(const AstNodeDType* dtypep) {
|
|
|
|
|
dtypep = dtypep->skipRefp();
|
|
|
|
|
// Support unpacked arrays of packed types
|
|
|
|
|
if (const AstUnpackArrayDType* const typep = VN_CAST(dtypep, UnpackArrayDType)) {
|
|
|
|
|
return isSupportedPackedDType(typep->subDTypep());
|
|
|
|
|
}
|
|
|
|
|
// Support packed types
|
|
|
|
|
return isSupportedPackedDType(dtypep);
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Return data type used to represent any packed value of the given 'width'. All packed types
|
|
|
|
|
// of a given width use the same canonical data type, as the only interesting information is
|
|
|
|
|
// the total width.
|
|
|
|
|
static AstNodeDType* dtypeForWidth(uint32_t width) {
|
|
|
|
|
return v3Global.rootp()->typeTablep()->findLogicDType(width, width, VSigning::UNSIGNED);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Return data type used to represent the type of 'nodep' when converted to a DfgVertex
|
|
|
|
|
static AstNodeDType* dtypeFor(const AstNode* nodep) {
|
|
|
|
|
UDEBUGONLY(UASSERT_OBJ(isSupportedDType(nodep->dtypep()), nodep, "Unsupported dtype"););
|
2022-09-27 01:06:50 +02:00
|
|
|
// For simplicity, all packed types are represented with a fixed type
|
|
|
|
|
if (AstUnpackArrayDType* const typep = VN_CAST(nodep->dtypep(), UnpackArrayDType)) {
|
|
|
|
|
// TODO: these need interning via AstTypeTable otherwise they leak
|
|
|
|
|
return new AstUnpackArrayDType{typep->fileline(),
|
|
|
|
|
dtypeForWidth(typep->subDTypep()->width()),
|
|
|
|
|
typep->rangep()->cloneTree(false)};
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return dtypeForWidth(nodep->width());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Source location
|
|
|
|
|
FileLine* fileline() const { return m_filelinep; }
|
|
|
|
|
// The data type of the result of the nodes
|
|
|
|
|
AstNodeDType* dtypep() const { return m_dtypep; }
|
2022-09-30 17:19:53 +02:00
|
|
|
// The type of this vertex
|
|
|
|
|
DfgType type() const { return m_type; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Width of result
|
|
|
|
|
uint32_t width() const {
|
|
|
|
|
// Everything supported is packed now, so we can just do this:
|
2022-09-27 01:06:50 +02:00
|
|
|
UASSERT_OBJ(VN_IS(dtypep(), BasicDType), this, "'width()' called on unpacked value");
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return dtypep()->width();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Cache type for 'equals' below
|
|
|
|
|
using EqualsCache = std::unordered_map<std::pair<const DfgVertex*, const DfgVertex*>, bool>;
|
|
|
|
|
|
|
|
|
|
// Vertex equality (based on this vertex and all upstream vertices feeding into this vertex).
|
|
|
|
|
// Returns true, if the vertices can be substituted for each other without changing the
|
|
|
|
|
// semantics of the logic. The 'cache' argument is used to store results to avoid repeat
|
|
|
|
|
// evaluations, but it requires that the upstream sources of the compared vertices do not
|
|
|
|
|
// change between invocations.
|
|
|
|
|
bool equals(const DfgVertex& that, EqualsCache& cache) const;
|
|
|
|
|
|
|
|
|
|
// Uncached version of 'equals'
|
|
|
|
|
bool equals(const DfgVertex& that) const {
|
|
|
|
|
EqualsCache cache; // Still cache recursive calls within this invocation
|
|
|
|
|
return equals(that, cache);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Cache type for 'hash' below
|
|
|
|
|
using HashCache = std::unordered_map<const DfgVertex*, V3Hash>;
|
|
|
|
|
|
|
|
|
|
// Hash of vertex (depends on this vertex and all upstream vertices feeding into this vertex).
|
|
|
|
|
// The 'cache' argument is used to store results to avoid repeat evaluations, but it requires
|
|
|
|
|
// that the upstream sources of the vertex do not change between invocations.
|
|
|
|
|
V3Hash hash(HashCache& cache) const;
|
|
|
|
|
|
|
|
|
|
// Uncached version of 'hash'
|
|
|
|
|
V3Hash hash() const {
|
|
|
|
|
HashCache cache; // Still cache recursive calls within this invocation
|
|
|
|
|
return hash(cache);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Source edges of this vertex
|
2022-09-25 17:03:15 +02:00
|
|
|
virtual std::pair<DfgEdge*, size_t> sourceEdges() = 0;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Source edges of this vertex
|
2022-09-25 17:03:15 +02:00
|
|
|
virtual std::pair<const DfgEdge*, size_t> sourceEdges() const = 0;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Arity (number of sources) of this vertex
|
|
|
|
|
size_t arity() const { return sourceEdges().second; }
|
|
|
|
|
|
|
|
|
|
// Predicate: has 1 or more sinks
|
|
|
|
|
bool hasSinks() const { return m_sinksp != nullptr; }
|
|
|
|
|
|
|
|
|
|
// Predicate: has 2 or more sinks
|
|
|
|
|
bool hasMultipleSinks() const { return m_sinksp && m_sinksp->m_nextp; }
|
|
|
|
|
|
|
|
|
|
// Fanout (number of sinks) of this vertex (expensive to compute)
|
|
|
|
|
uint32_t fanout() const;
|
|
|
|
|
|
|
|
|
|
// Unlink from container (graph or builder), then delete this vertex
|
|
|
|
|
void unlinkDelete(DfgGraph& dfg);
|
|
|
|
|
|
|
|
|
|
// Relink all sinks to be driven from the given new source
|
|
|
|
|
void replaceWith(DfgVertex* newSourcep);
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each source vertex of this vertex
|
|
|
|
|
// Unconnected source edges are not iterated.
|
|
|
|
|
inline void forEachSource(std::function<void(const DfgVertex&)> f) const;
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each source edge of this vertex. Also passes source index.
|
|
|
|
|
inline void forEachSourceEdge(std::function<void(DfgEdge&, size_t)> f);
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each source edge of this vertex. Also passes source index.
|
|
|
|
|
inline void forEachSourceEdge(std::function<void(const DfgEdge&, size_t)> f) const;
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each sink vertex of this vertex
|
2022-09-27 01:06:50 +02:00
|
|
|
// Unlinking/deleting the given sink during iteration is safe, but not other sinks of this
|
|
|
|
|
// vertex.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
inline void forEachSink(std::function<void(DfgVertex&)> f);
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each sink vertex of this vertex
|
|
|
|
|
inline void forEachSink(std::function<void(const DfgVertex&)> f) const;
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each sink edge of this vertex.
|
|
|
|
|
// Unlinking/deleting the given sink during iteration is safe, but not other sinks of this
|
|
|
|
|
// vertex.
|
|
|
|
|
inline void forEachSinkEdge(std::function<void(DfgEdge&)> f);
|
|
|
|
|
|
|
|
|
|
// Calls given function 'f' for each sink edge of this vertex.
|
|
|
|
|
inline void forEachSinkEdge(std::function<void(const DfgEdge&)> f) const;
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
// Returns first source edge which satisfies the given predicate 'p', or nullptr if no such
|
|
|
|
|
// sink vertex exists
|
2022-09-30 17:19:53 +02:00
|
|
|
inline const DfgEdge* findSourceEdge(std::function<bool(const DfgEdge&, size_t)> p) const;
|
2022-09-27 01:06:50 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Returns first sink vertex of type 'Vertex' which satisfies the given predicate 'p',
|
|
|
|
|
// or nullptr if no such sink vertex exists
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
inline Vertex* findSink(std::function<bool(const Vertex&)> p) const;
|
|
|
|
|
|
|
|
|
|
// Returns first sink vertex of type 'Vertex', or nullptr if no such sink vertex exists.
|
|
|
|
|
// This is a special case of 'findSink' above with the predicate always true.
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
inline Vertex* findSink() const;
|
|
|
|
|
|
|
|
|
|
// Is this a DfgConst that is all zeroes
|
|
|
|
|
inline bool isZero() const;
|
|
|
|
|
|
|
|
|
|
// Is this a DfgConst that is all ones
|
|
|
|
|
inline bool isOnes() const;
|
|
|
|
|
|
|
|
|
|
// Methods that allow DfgVertex to participate in error reporting/messaging
|
|
|
|
|
void v3errorEnd(std::ostringstream& str) const { m_filelinep->v3errorEnd(str); }
|
|
|
|
|
void v3errorEndFatal(std::ostringstream& str) const VL_ATTR_NORETURN {
|
|
|
|
|
m_filelinep->v3errorEndFatal(str);
|
|
|
|
|
}
|
|
|
|
|
string warnContextPrimary() const { return fileline()->warnContextPrimary(); }
|
|
|
|
|
string warnContextSecondary() const { return fileline()->warnContextSecondary(); }
|
|
|
|
|
string warnMore() const { return fileline()->warnMore(); }
|
|
|
|
|
string warnOther() const { return fileline()->warnOther(); }
|
|
|
|
|
|
|
|
|
|
// Subtype test
|
|
|
|
|
template <typename T>
|
|
|
|
|
bool is() const {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertex, T>::value, "'T' must be a subtype of DfgVertex");
|
|
|
|
|
return m_type == T::dfgType();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Ensure subtype, then cast to that type
|
|
|
|
|
template <typename T>
|
|
|
|
|
T* as() {
|
|
|
|
|
UASSERT_OBJ(is<T>(), this,
|
|
|
|
|
"DfgVertex is not of expected type, but instead has type '" << typeName()
|
|
|
|
|
<< "'");
|
|
|
|
|
return static_cast<T*>(this);
|
|
|
|
|
}
|
|
|
|
|
template <typename T>
|
|
|
|
|
const T* as() const {
|
|
|
|
|
UASSERT_OBJ(is<T>(), this,
|
|
|
|
|
"DfgVertex is not of expected type, but instead has type '" << typeName()
|
|
|
|
|
<< "'");
|
|
|
|
|
return static_cast<const T*>(this);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Cast to subtype, or null if different
|
|
|
|
|
template <typename T>
|
|
|
|
|
T* cast() {
|
|
|
|
|
return is<T>() ? static_cast<T*>(this) : nullptr;
|
|
|
|
|
}
|
|
|
|
|
template <typename T>
|
|
|
|
|
const T* cast() const {
|
|
|
|
|
return is<T>() ? static_cast<const T*>(this) : nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Human-readable vertex type as string for debugging
|
|
|
|
|
const string typeName() const { return m_type.ascii(); }
|
|
|
|
|
|
|
|
|
|
// Human-readable name for source operand with given index for debugging
|
|
|
|
|
virtual const string srcName(size_t idx) const = 0;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// DfgVertices are, well ... DfgVertices
|
|
|
|
|
template <>
|
|
|
|
|
constexpr bool DfgVertex::is<DfgVertex>() const {
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
constexpr DfgVertex* DfgVertex::as<DfgVertex>() {
|
|
|
|
|
return this;
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
constexpr const DfgVertex* DfgVertex::as<DfgVertex>() const {
|
|
|
|
|
return this;
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
constexpr DfgVertex* DfgVertex::cast<DfgVertex>() {
|
|
|
|
|
return this;
|
|
|
|
|
}
|
|
|
|
|
template <>
|
|
|
|
|
constexpr const DfgVertex* DfgVertex::cast<DfgVertex>() const {
|
|
|
|
|
return this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <size_t Arity>
|
|
|
|
|
class DfgVertexWithArity VL_NOT_FINAL : public DfgVertex {
|
|
|
|
|
static_assert(1 <= Arity && Arity <= 4, "Arity must be between 1 and 4 inclusive");
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
std::array<DfgEdge, Arity> m_srcs; // Source edges
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
protected:
|
|
|
|
|
DfgVertexWithArity<Arity>(DfgGraph& dfg, FileLine* flp, AstNodeDType* dtypep, DfgType type)
|
|
|
|
|
: DfgVertex{dfg, flp, dtypep, type} {
|
|
|
|
|
// Initialize source edges
|
2022-09-25 17:03:15 +02:00
|
|
|
for (size_t i = 0; i < Arity; ++i) m_srcs[i].init(this);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
~DfgVertexWithArity<Arity>() override = default;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
std::pair<DfgEdge*, size_t> sourceEdges() override { //
|
2022-09-25 17:03:15 +02:00
|
|
|
return {m_srcs.data(), Arity};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
std::pair<const DfgEdge*, size_t> sourceEdges() const override {
|
2022-09-25 17:03:15 +02:00
|
|
|
return {m_srcs.data(), Arity};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <size_t Index>
|
|
|
|
|
DfgEdge* sourceEdge() {
|
|
|
|
|
static_assert(Index < Arity, "Source index out of range");
|
|
|
|
|
return &m_srcs[Index];
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:21 +02:00
|
|
|
template <size_t Index>
|
|
|
|
|
const DfgEdge* sourceEdge() const {
|
|
|
|
|
static_assert(Index < Arity, "Source index out of range");
|
|
|
|
|
return &m_srcs[Index];
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
template <size_t Index>
|
|
|
|
|
DfgVertex* source() const {
|
|
|
|
|
static_assert(Index < Arity, "Source index out of range");
|
2022-09-25 17:03:15 +02:00
|
|
|
return m_srcs[Index].sourcep();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <size_t Index>
|
|
|
|
|
void relinkSource(DfgVertex* newSourcep) {
|
|
|
|
|
static_assert(Index < Arity, "Source index out of range");
|
2022-09-25 17:03:15 +02:00
|
|
|
UASSERT_OBJ(m_srcs[Index].sinkp() == this, this, "Inconsistent");
|
|
|
|
|
m_srcs[Index].relinkSource(newSourcep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Named source getter/setter for unary vertices
|
|
|
|
|
template <size_t A = Arity>
|
|
|
|
|
typename std::enable_if<A == 1, DfgVertex*>::type srcp() const {
|
|
|
|
|
static_assert(A == Arity, "Should not be changed");
|
|
|
|
|
return source<0>();
|
|
|
|
|
}
|
|
|
|
|
template <size_t A = Arity>
|
|
|
|
|
typename std::enable_if<A == 1, void>::type srcp(DfgVertex* vtxp) {
|
|
|
|
|
static_assert(A == Arity, "Should not be changed");
|
|
|
|
|
relinkSource<0>(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Named source getter/setter for binary vertices
|
|
|
|
|
template <size_t A = Arity>
|
|
|
|
|
typename std::enable_if<A == 2, DfgVertex*>::type lhsp() const {
|
|
|
|
|
static_assert(A == Arity, "Should not be changed");
|
|
|
|
|
return source<0>();
|
|
|
|
|
}
|
|
|
|
|
template <size_t A = Arity>
|
|
|
|
|
typename std::enable_if<A == 2, void>::type lhsp(DfgVertex* vtxp) {
|
|
|
|
|
static_assert(A == Arity, "Should not be changed");
|
|
|
|
|
relinkSource<0>(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <size_t A = Arity>
|
|
|
|
|
typename std::enable_if<A == 2, DfgVertex*>::type rhsp() const {
|
|
|
|
|
static_assert(A == Arity, "Should not be changed");
|
|
|
|
|
return source<1>();
|
|
|
|
|
}
|
|
|
|
|
template <size_t A = Arity>
|
|
|
|
|
typename std::enable_if<A == 2, void>::type rhsp(DfgVertex* vtxp) {
|
|
|
|
|
static_assert(A == Arity, "Should not be changed");
|
|
|
|
|
relinkSource<1>(vtxp);
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
class DfgVertexVariadic VL_NOT_FINAL : public DfgVertex {
|
|
|
|
|
DfgEdge* m_srcsp; // The source edges
|
|
|
|
|
uint32_t m_srcCnt = 0; // Number of sources used
|
|
|
|
|
uint32_t m_srcCap; // Number of sources allocated
|
|
|
|
|
|
|
|
|
|
// Allocate a new source edge array
|
|
|
|
|
DfgEdge* allocSources(size_t n) {
|
|
|
|
|
DfgEdge* const srcsp = new DfgEdge[n];
|
|
|
|
|
for (size_t i = 0; i < n; ++i) srcsp[i].init(this);
|
|
|
|
|
return srcsp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Double the capacity of m_srcsp
|
|
|
|
|
void growSources() {
|
|
|
|
|
m_srcCap *= 2;
|
|
|
|
|
DfgEdge* const newsp = allocSources(m_srcCap);
|
|
|
|
|
for (size_t i = 0; i < m_srcCnt; ++i) {
|
|
|
|
|
DfgEdge* const oldp = m_srcsp + i;
|
|
|
|
|
// Skip over unlinked source edge
|
|
|
|
|
if (!oldp->sourcep()) continue;
|
|
|
|
|
// New edge driven from the same vertex as the old edge
|
|
|
|
|
newsp[i].relinkSource(oldp->sourcep());
|
|
|
|
|
// Unlink the old edge, it will be deleted
|
|
|
|
|
oldp->unlinkSource();
|
|
|
|
|
}
|
|
|
|
|
// Delete old source edges
|
|
|
|
|
delete[] m_srcsp;
|
|
|
|
|
// Keep hold of new source edges
|
|
|
|
|
m_srcsp = newsp;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
protected:
|
|
|
|
|
DfgVertexVariadic(DfgGraph& dfg, FileLine* flp, AstNodeDType* dtypep, DfgType type,
|
|
|
|
|
uint32_t initialCapacity = 1)
|
|
|
|
|
: DfgVertex{dfg, flp, dtypep, type}
|
|
|
|
|
, m_srcsp{allocSources(initialCapacity)}
|
|
|
|
|
, m_srcCap{initialCapacity} {}
|
|
|
|
|
|
|
|
|
|
~DfgVertexVariadic() override { delete[] m_srcsp; };
|
|
|
|
|
|
|
|
|
|
DfgEdge* addSource() {
|
|
|
|
|
if (m_srcCnt == m_srcCap) growSources();
|
|
|
|
|
return m_srcsp + m_srcCnt++;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void resetSources() {
|
|
|
|
|
// #ifdef VL_DEBUG TODO: DEBUG ONLY
|
|
|
|
|
for (uint32_t i = 0; i < m_srcCnt; ++i) {
|
|
|
|
|
UASSERT_OBJ(!m_srcsp[i].sourcep(), m_srcsp[i].sourcep(), "Connected source");
|
|
|
|
|
}
|
|
|
|
|
// #endif
|
|
|
|
|
m_srcCnt = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
DfgEdge* sourceEdge(size_t idx) const { return &m_srcsp[idx]; }
|
|
|
|
|
DfgVertex* source(size_t idx) const { return m_srcsp[idx].sourcep(); }
|
|
|
|
|
|
|
|
|
|
std::pair<DfgEdge*, size_t> sourceEdges() override { return {m_srcsp, m_srcCnt}; }
|
|
|
|
|
std::pair<const DfgEdge*, size_t> sourceEdges() const override { return {m_srcsp, m_srcCnt}; }
|
|
|
|
|
};
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Vertex classes
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
class DfgVertexLValue VL_NOT_FINAL : public DfgVertexVariadic {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
AstVar* const m_varp; // The AstVar associated with this vertex (not owned by this vertex)
|
|
|
|
|
bool m_hasModRefs = false; // This AstVar is referenced outside the DFG, but in the module
|
|
|
|
|
bool m_hasExtRefs = false; // This AstVar is referenced from outside the module
|
|
|
|
|
|
|
|
|
|
public:
|
2022-09-25 17:03:15 +02:00
|
|
|
DfgVertexLValue(DfgGraph& dfg, DfgType type, AstVar* varp, uint32_t initialCapacity)
|
|
|
|
|
: DfgVertexVariadic{dfg, varp->fileline(), dtypeFor(varp), type, initialCapacity}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
, m_varp{varp} {}
|
|
|
|
|
|
|
|
|
|
AstVar* varp() const { return m_varp; }
|
|
|
|
|
bool hasModRefs() const { return m_hasModRefs; }
|
|
|
|
|
void setHasModRefs() { m_hasModRefs = true; }
|
|
|
|
|
bool hasExtRefs() const { return m_hasExtRefs; }
|
|
|
|
|
void setHasExtRefs() { m_hasExtRefs = true; }
|
|
|
|
|
bool hasRefs() const { return m_hasModRefs || m_hasExtRefs; }
|
|
|
|
|
|
|
|
|
|
// Variable cannot be removed, even if redundant in the DfgGraph (might be used externally)
|
|
|
|
|
bool keep() const {
|
|
|
|
|
// Keep if referenced outside this module
|
|
|
|
|
if (hasExtRefs()) return true;
|
|
|
|
|
// Keep if traced
|
|
|
|
|
if (v3Global.opt.trace() && varp()->isTrace()) return true;
|
|
|
|
|
// Keep if public
|
|
|
|
|
if (varp()->isSigPublic()) return true;
|
|
|
|
|
// Otherwise it can be removed
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2022-09-25 17:03:15 +02:00
|
|
|
};
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
class DfgVarPacked final : public DfgVertexLValue {
|
2022-09-25 17:03:15 +02:00
|
|
|
friend class DfgVertex;
|
|
|
|
|
friend class DfgVisitor;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
using DriverData = std::pair<FileLine*, uint32_t>;
|
|
|
|
|
|
|
|
|
|
std::vector<DriverData> m_driverData; // Additional data associate with each driver
|
|
|
|
|
|
|
|
|
|
void accept(DfgVisitor& visitor) override;
|
|
|
|
|
bool selfEquals(const DfgVertex& that) const override;
|
|
|
|
|
V3Hash selfHash() const override;
|
|
|
|
|
static constexpr DfgType dfgType() { return DfgType::atVar; };
|
|
|
|
|
|
|
|
|
|
public:
|
2022-09-27 01:06:50 +02:00
|
|
|
DfgVarPacked(DfgGraph& dfg, AstVar* varp)
|
2022-09-25 17:03:15 +02:00
|
|
|
: DfgVertexLValue{dfg, dfgType(), varp, 1u} {}
|
|
|
|
|
|
|
|
|
|
bool isDrivenByDfg() const { return arity() > 0; }
|
|
|
|
|
bool isDrivenFullyByDfg() const { return arity() == 1 && source(0)->dtypep() == dtypep(); }
|
|
|
|
|
|
|
|
|
|
void addDriver(FileLine* flp, uint32_t lsb, DfgVertex* vtxp) {
|
|
|
|
|
m_driverData.emplace_back(flp, lsb);
|
|
|
|
|
DfgVertexVariadic::addSource()->relinkSource(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void resetSources() {
|
|
|
|
|
m_driverData.clear();
|
|
|
|
|
DfgVertexVariadic::resetSources();
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-28 15:42:18 +02:00
|
|
|
// Remove undriven sources
|
|
|
|
|
void packSources() {
|
|
|
|
|
// Grab and reset the driver data
|
|
|
|
|
std::vector<DriverData> driverData{std::move(m_driverData)};
|
|
|
|
|
|
|
|
|
|
// Grab and unlink the sources
|
|
|
|
|
std::vector<DfgVertex*> sources{arity()};
|
|
|
|
|
forEachSourceEdge([&](DfgEdge& edge, size_t idx) {
|
|
|
|
|
sources[idx] = edge.sourcep();
|
|
|
|
|
edge.unlinkSource();
|
|
|
|
|
});
|
|
|
|
|
DfgVertexVariadic::resetSources();
|
|
|
|
|
|
|
|
|
|
// Add back the driven sources
|
|
|
|
|
for (size_t i = 0; i < sources.size(); ++i) {
|
|
|
|
|
if (!sources[i]) continue;
|
|
|
|
|
addDriver(driverData[i].first, driverData[i].second, sources[i]);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
FileLine* driverFileLine(size_t idx) const { return m_driverData[idx].first; }
|
|
|
|
|
uint32_t driverLsb(size_t idx) const { return m_driverData[idx].second; }
|
|
|
|
|
|
|
|
|
|
const string srcName(size_t idx) const override {
|
|
|
|
|
return isDrivenFullyByDfg() ? "" : cvtToStr(driverLsb(idx));
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
class DfgVarArray final : public DfgVertexLValue {
|
|
|
|
|
friend class DfgVertex;
|
|
|
|
|
friend class DfgVisitor;
|
|
|
|
|
|
|
|
|
|
using DriverData = std::pair<FileLine*, uint32_t>;
|
|
|
|
|
|
|
|
|
|
std::vector<DriverData> m_driverData; // Additional data associate with each driver
|
|
|
|
|
|
|
|
|
|
void accept(DfgVisitor& visitor) override;
|
|
|
|
|
bool selfEquals(const DfgVertex& that) const override;
|
|
|
|
|
V3Hash selfHash() const override;
|
|
|
|
|
static constexpr DfgType dfgType() { return DfgType::atUnpackArrayDType; }; // TODO: gross
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
DfgVarArray(DfgGraph& dfg, AstVar* varp)
|
|
|
|
|
: DfgVertexLValue{dfg, dfgType(), varp, 4u} {
|
|
|
|
|
UASSERT_OBJ(VN_IS(varp->dtypeSkipRefp(), UnpackArrayDType), varp, "Non array DfgVarArray");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool isDrivenByDfg() const { return arity() > 0; }
|
|
|
|
|
|
|
|
|
|
void addDriver(FileLine* flp, uint32_t index, DfgVertex* vtxp) {
|
|
|
|
|
m_driverData.emplace_back(flp, index);
|
|
|
|
|
DfgVertexVariadic::addSource()->relinkSource(vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void resetSources() {
|
|
|
|
|
m_driverData.clear();
|
|
|
|
|
DfgVertexVariadic::resetSources();
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-28 15:42:18 +02:00
|
|
|
// Remove undriven sources
|
|
|
|
|
void packSources() {
|
|
|
|
|
// Grab and reset the driver data
|
|
|
|
|
std::vector<DriverData> driverData{std::move(m_driverData)};
|
|
|
|
|
|
|
|
|
|
// Grab and unlink the sources
|
|
|
|
|
std::vector<DfgVertex*> sources{arity()};
|
|
|
|
|
forEachSourceEdge([&](DfgEdge& edge, size_t idx) {
|
|
|
|
|
sources[idx] = edge.sourcep();
|
|
|
|
|
edge.unlinkSource();
|
|
|
|
|
});
|
|
|
|
|
DfgVertexVariadic::resetSources();
|
|
|
|
|
|
|
|
|
|
// Add back the driven sources
|
|
|
|
|
for (size_t i = 0; i < sources.size(); ++i) {
|
|
|
|
|
if (!sources[i]) continue;
|
|
|
|
|
addDriver(driverData[i].first, driverData[i].second, sources[i]);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
FileLine* driverFileLine(size_t idx) const { return m_driverData[idx].first; }
|
|
|
|
|
uint32_t driverIndex(size_t idx) const { return m_driverData[idx].second; }
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
DfgVertex* driverAt(size_t idx) const {
|
|
|
|
|
const DfgEdge* const edgep = findSourceEdge([=](const DfgEdge&, size_t i) { //
|
|
|
|
|
return driverIndex(i) == idx;
|
|
|
|
|
});
|
|
|
|
|
return edgep ? edgep->sourcep() : nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
const string srcName(size_t idx) const override { return cvtToStr(driverIndex(idx)); }
|
|
|
|
|
};
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class DfgConst final : public DfgVertex {
|
|
|
|
|
friend class DfgVertex;
|
|
|
|
|
friend class DfgVisitor;
|
|
|
|
|
|
|
|
|
|
AstConst* const m_constp; // The AstConst associated with this vertex (owned by this vertex)
|
|
|
|
|
|
|
|
|
|
void accept(DfgVisitor& visitor) override;
|
|
|
|
|
bool selfEquals(const DfgVertex& that) const override;
|
|
|
|
|
V3Hash selfHash() const override;
|
|
|
|
|
static constexpr DfgType dfgType() { return DfgType::atConst; };
|
|
|
|
|
|
|
|
|
|
public:
|
|
|
|
|
DfgConst(DfgGraph& dfg, AstConst* constp)
|
|
|
|
|
: DfgVertex{dfg, constp->fileline(), dtypeFor(constp), dfgType()}
|
|
|
|
|
, m_constp{constp} {}
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
~DfgConst() override { VL_DO_DANGLING(m_constp->deleteTree(), m_constp); }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
AstConst* constp() const { return m_constp; }
|
|
|
|
|
V3Number& num() const { return m_constp->num(); }
|
|
|
|
|
|
|
|
|
|
uint32_t toU32() const { return num().toUInt(); }
|
|
|
|
|
int32_t toI32() const { return num().toSInt(); }
|
|
|
|
|
|
|
|
|
|
bool isZero() const { return num().isEqZero(); }
|
|
|
|
|
bool isOnes() const { return num().isEqAllOnes(width()); }
|
|
|
|
|
|
2022-09-25 17:03:15 +02:00
|
|
|
std::pair<DfgEdge*, size_t> sourceEdges() override { return {nullptr, 0}; }
|
|
|
|
|
std::pair<const DfgEdge*, size_t> sourceEdges() const override { return {nullptr, 0}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
const string srcName(size_t) const override { // LCOV_EXCL_START
|
|
|
|
|
VL_UNREACHABLE;
|
|
|
|
|
return "";
|
|
|
|
|
} // LCOV_EXCL_STOP
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// The rest of the DfgVertex subclasses are generated by 'astgen' from AstNodeMath nodes
|
|
|
|
|
#include "V3Dfg__gen_vertex_classes.h"
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Dfg vertex visitor
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
class DfgVisitor VL_NOT_FINAL {
|
|
|
|
|
public:
|
|
|
|
|
// Dispatch to most specific 'visit' method on 'vtxp'
|
|
|
|
|
void iterate(DfgVertex* vtxp) { vtxp->accept(*this); }
|
|
|
|
|
|
2022-09-27 01:06:50 +02:00
|
|
|
virtual void visit(DfgVarPacked* vtxp);
|
|
|
|
|
virtual void visit(DfgVarArray* vtxp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
virtual void visit(DfgConst* vtxp);
|
|
|
|
|
#include "V3Dfg__gen_visitor_decls.h"
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
// Inline method definitions
|
|
|
|
|
//------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
void DfgGraph::addVertex(DfgVertex& vtx) {
|
|
|
|
|
++m_size;
|
|
|
|
|
vtx.m_verticesEnt.pushBack(m_vertices, &vtx);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::removeVertex(DfgVertex& vtx) {
|
|
|
|
|
--m_size;
|
|
|
|
|
vtx.m_verticesEnt.unlink(m_vertices, &vtx);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::forEachVertex(std::function<void(DfgVertex&)> f) {
|
|
|
|
|
for (DfgVertex *vtxp = m_vertices.begin(), *nextp; vtxp; vtxp = nextp) {
|
|
|
|
|
nextp = vtxp->m_verticesEnt.nextp();
|
|
|
|
|
f(*vtxp);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::forEachVertex(std::function<void(const DfgVertex&)> f) const {
|
|
|
|
|
for (const DfgVertex* vtxp = m_vertices.begin(); vtxp; vtxp = vtxp->m_verticesEnt.nextp()) {
|
|
|
|
|
f(*vtxp);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgGraph::forEachVertexInReverse(std::function<void(DfgVertex&)> f) {
|
|
|
|
|
for (DfgVertex *vtxp = m_vertices.rbegin(), *nextp; vtxp; vtxp = nextp) {
|
|
|
|
|
nextp = vtxp->m_verticesEnt.prevp();
|
|
|
|
|
f(*vtxp);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
Vertex* DfgGraph::findVertex(std::function<bool(const Vertex&)> p) const {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertex, Vertex>::value,
|
|
|
|
|
"'Vertex' must be subclass of 'DfgVertex'");
|
|
|
|
|
for (DfgVertex* vtxp = m_vertices.begin(); vtxp; vtxp = vtxp->m_verticesEnt.nextp()) {
|
|
|
|
|
if (Vertex* const vvtxp = vtxp->cast<Vertex>()) {
|
|
|
|
|
if (p(*vvtxp)) return vvtxp;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSource(std::function<void(const DfgVertex&)> f) const {
|
|
|
|
|
const auto pair = sourceEdges();
|
|
|
|
|
const DfgEdge* const edgesp = pair.first;
|
|
|
|
|
const size_t arity = pair.second;
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) {
|
|
|
|
|
if (DfgVertex* const sourcep = edgesp[i].m_sourcep) f(*sourcep);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSink(std::function<void(DfgVertex&)> f) {
|
2022-09-27 01:06:50 +02:00
|
|
|
for (const DfgEdge *edgep = m_sinksp, *nextp; edgep; edgep = nextp) {
|
|
|
|
|
nextp = edgep->m_nextp;
|
|
|
|
|
f(*edgep->m_sinkp);
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSink(std::function<void(const DfgVertex&)> f) const {
|
|
|
|
|
for (const DfgEdge* edgep = m_sinksp; edgep; edgep = edgep->m_nextp) f(*edgep->m_sinkp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSourceEdge(std::function<void(DfgEdge&, size_t)> f) {
|
|
|
|
|
const auto pair = sourceEdges();
|
|
|
|
|
DfgEdge* const edgesp = pair.first;
|
|
|
|
|
const size_t arity = pair.second;
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) f(edgesp[i], i);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSourceEdge(std::function<void(const DfgEdge&, size_t)> f) const {
|
|
|
|
|
const auto pair = sourceEdges();
|
|
|
|
|
const DfgEdge* const edgesp = pair.first;
|
|
|
|
|
const size_t arity = pair.second;
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) f(edgesp[i], i);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSinkEdge(std::function<void(DfgEdge&)> f) {
|
|
|
|
|
for (DfgEdge *edgep = m_sinksp, *nextp; edgep; edgep = nextp) {
|
|
|
|
|
nextp = edgep->m_nextp;
|
|
|
|
|
f(*edgep);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void DfgVertex::forEachSinkEdge(std::function<void(const DfgEdge&)> f) const {
|
|
|
|
|
for (DfgEdge *edgep = m_sinksp, *nextp; edgep; edgep = nextp) {
|
|
|
|
|
nextp = edgep->m_nextp;
|
|
|
|
|
f(*edgep);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
const DfgEdge* DfgVertex::findSourceEdge(std::function<bool(const DfgEdge&, size_t)> p) const {
|
2022-09-27 01:06:50 +02:00
|
|
|
const auto pair = sourceEdges();
|
2022-09-30 17:19:53 +02:00
|
|
|
const DfgEdge* const edgesp = pair.first;
|
2022-09-27 01:06:50 +02:00
|
|
|
const size_t arity = pair.second;
|
|
|
|
|
for (size_t i = 0; i < arity; ++i) {
|
2022-09-30 17:19:53 +02:00
|
|
|
const DfgEdge& edge = edgesp[i];
|
2022-09-27 01:06:50 +02:00
|
|
|
if (p(edge, i)) return &edge;
|
|
|
|
|
}
|
|
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
template <typename Vertex>
|
|
|
|
|
Vertex* DfgVertex::findSink(std::function<bool(const Vertex&)> p) const {
|
|
|
|
|
static_assert(std::is_base_of<DfgVertex, Vertex>::value,
|
|
|
|
|
"'Vertex' must be subclass of 'DfgVertex'");
|
|
|
|
|
for (DfgEdge* edgep = m_sinksp; edgep; edgep = edgep->m_nextp) {
|
|
|
|
|
if (Vertex* const sinkp = edgep->m_sinkp->cast<Vertex>()) {
|
|
|
|
|
if (p(*sinkp)) return sinkp;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return nullptr;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename Vertex>
|
|
|
|
|
Vertex* DfgVertex::findSink() const {
|
|
|
|
|
static_assert(!std::is_same<DfgVertex, Vertex>::value,
|
|
|
|
|
"'Vertex' must be proper subclass of 'DfgVertex'");
|
|
|
|
|
return findSink<Vertex>([](const Vertex&) { return true; });
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool DfgVertex::isZero() const {
|
|
|
|
|
if (const DfgConst* const constp = cast<DfgConst>()) return constp->isZero();
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
bool DfgVertex::isOnes() const {
|
|
|
|
|
if (const DfgConst* const constp = cast<DfgConst>()) return constp->isOnes();
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#endif
|