Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// -*- mode: C++; c-file-style: "cc-mode" -*-
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
// DESCRIPTION: Verilator: Peephole optimizations over DfgGraph
|
|
|
|
|
//
|
|
|
|
|
// Code available from: https://verilator.org
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
2026-01-27 02:24:34 +01:00
|
|
|
// This program is free software; you can redistribute it and/or modify it
|
|
|
|
|
// under the terms of either the GNU Lesser General Public License Version 3
|
|
|
|
|
// or the Perl Artistic License Version 2.0.
|
|
|
|
|
// SPDX-FileCopyrightText: 2003-2026 Wilson Snyder
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
//
|
|
|
|
|
// A pattern-matching based optimizer for DfgGraph. This is in some aspects similar to V3Const, but
|
|
|
|
|
// more powerful in that it does not care about ordering combinational statement. This is also less
|
|
|
|
|
// broadly applicable than V3Const, as it does not apply to procedural statements with sequential
|
|
|
|
|
// execution semantics.
|
|
|
|
|
//
|
|
|
|
|
//*************************************************************************
|
|
|
|
|
|
2023-10-18 12:37:46 +02:00
|
|
|
#include "V3PchAstNoMT.h" // VL_MT_DISABLED_CODE_UNIT
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Dfg.h"
|
2024-03-06 19:01:52 +01:00
|
|
|
#include "V3DfgCache.h"
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3DfgPasses.h"
|
2025-07-26 21:37:01 +02:00
|
|
|
#include "V3DfgPeepholePatterns.h"
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include "V3Stats.h"
|
|
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
#include <algorithm>
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#include <cctype>
|
2026-01-31 18:28:14 +01:00
|
|
|
#include <vector>
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
VL_DEFINE_DEBUG_FUNCTIONS;
|
|
|
|
|
|
2025-07-26 21:37:01 +02:00
|
|
|
V3DfgPeepholeContext::V3DfgPeepholeContext(V3DfgContext& ctx, const std::string& label)
|
|
|
|
|
: V3DfgSubContext{ctx, label, "Peephole"} {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
const auto checkEnabled = [this](VDfgPeepholePattern id) {
|
2025-07-26 21:37:01 +02:00
|
|
|
std::string str{id.ascii()};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { //
|
|
|
|
|
return c == '_' ? '-' : std::tolower(c);
|
|
|
|
|
});
|
|
|
|
|
m_enabled[id] = v3Global.opt.fDfgPeepholeEnabled(str);
|
|
|
|
|
};
|
|
|
|
|
#define OPTIMIZATION_CHECK_ENABLED(id, name) checkEnabled(VDfgPeepholePattern::id);
|
|
|
|
|
FOR_EACH_DFG_PEEPHOLE_OPTIMIZATION(OPTIMIZATION_CHECK_ENABLED)
|
|
|
|
|
#undef OPTIMIZATION_CHECK_ENABLED
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
V3DfgPeepholeContext::~V3DfgPeepholeContext() {
|
|
|
|
|
const auto emitStat = [this](VDfgPeepholePattern id) {
|
2025-07-26 21:37:01 +02:00
|
|
|
std::string str{id.ascii()};
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { //
|
|
|
|
|
return c == '_' ? ' ' : std::tolower(c);
|
|
|
|
|
});
|
2025-07-26 21:37:01 +02:00
|
|
|
addStat(str, m_count[id]);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
#define OPTIMIZATION_EMIT_STATS(id, name) emitStat(VDfgPeepholePattern::id);
|
|
|
|
|
FOR_EACH_DFG_PEEPHOLE_OPTIMIZATION(OPTIMIZATION_EMIT_STATS)
|
|
|
|
|
#undef OPTIMIZATION_EMIT_STATS
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
// clang-format off
|
|
|
|
|
template <typename T_Reduction>
|
|
|
|
|
struct ReductionToBitwiseImpl {};
|
|
|
|
|
template <> struct ReductionToBitwiseImpl<DfgRedAnd> { using type = DfgAnd; };
|
|
|
|
|
template <> struct ReductionToBitwiseImpl<DfgRedOr> { using type = DfgOr; };
|
|
|
|
|
template <> struct ReductionToBitwiseImpl<DfgRedXor> { using type = DfgXor; };
|
2023-11-11 05:25:53 +01:00
|
|
|
template <typename T_Reduction>
|
|
|
|
|
using ReductionToBitwise = typename ReductionToBitwiseImpl<T_Reduction>::type;
|
2022-09-30 17:19:53 +02:00
|
|
|
|
|
|
|
|
template <typename T_Bitwise>
|
|
|
|
|
struct BitwiseToReductionImpl {};
|
|
|
|
|
template <> struct BitwiseToReductionImpl<DfgAnd> { using type = DfgRedAnd; };
|
|
|
|
|
template <> struct BitwiseToReductionImpl<DfgOr> { using type = DfgRedOr; };
|
|
|
|
|
template <> struct BitwiseToReductionImpl<DfgXor> { using type = DfgRedXor; };
|
2023-11-11 05:25:53 +01:00
|
|
|
template <typename T_Reduction>
|
|
|
|
|
using BitwiseToReduction = typename BitwiseToReductionImpl<T_Reduction>::type;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
|
template<typename Vertex> void foldOp(V3Number& out, const V3Number& src);
|
|
|
|
|
template <> void foldOp<DfgExtend> (V3Number& out, const V3Number& src) { out.opAssign(src); }
|
|
|
|
|
template <> void foldOp<DfgExtendS> (V3Number& out, const V3Number& src) { out.opExtendS(src, src.width()); }
|
|
|
|
|
template <> void foldOp<DfgLogNot> (V3Number& out, const V3Number& src) { out.opLogNot(src); }
|
|
|
|
|
template <> void foldOp<DfgNegate> (V3Number& out, const V3Number& src) { out.opNegate(src); }
|
|
|
|
|
template <> void foldOp<DfgNot> (V3Number& out, const V3Number& src) { out.opNot(src); }
|
|
|
|
|
template <> void foldOp<DfgRedAnd> (V3Number& out, const V3Number& src) { out.opRedAnd(src); }
|
|
|
|
|
template <> void foldOp<DfgRedOr> (V3Number& out, const V3Number& src) { out.opRedOr(src); }
|
|
|
|
|
template <> void foldOp<DfgRedXor> (V3Number& out, const V3Number& src) { out.opRedXor(src); }
|
|
|
|
|
|
|
|
|
|
template<typename Vertex> void foldOp(V3Number& out, const V3Number& lhs, const V3Number& rhs);
|
|
|
|
|
template <> void foldOp<DfgAdd> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opAdd(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgAnd> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opAnd(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgConcat> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opConcat(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgDiv> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opDiv(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgDivS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opDivS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgEq> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opEq(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGt> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGt(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGtS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGtS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGte> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGte(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgGteS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opGteS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogAnd> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogAnd(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogEq> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogEq(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogIf> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogIf(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLogOr> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLogOr(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLt> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLt(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLtS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLtS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgLte> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLte(lhs, rhs); }
|
2025-05-14 14:33:20 +02:00
|
|
|
template <> void foldOp<DfgLteS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opLteS(lhs, rhs); }
|
2022-10-04 22:18:39 +02:00
|
|
|
template <> void foldOp<DfgModDiv> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opModDiv(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgModDivS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opModDivS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgMul> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opMul(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgMulS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opMulS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgNeq> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opNeq(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgOr> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opOr(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPow> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPow(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPowSS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPowSS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPowSU> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPowSU(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgPowUS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opPowUS(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgReplicate> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opRepl(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgShiftL> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opShiftL(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgShiftR> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opShiftR(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgShiftRS> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opShiftRS(lhs, rhs, lhs.width()); }
|
|
|
|
|
template <> void foldOp<DfgSub> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opSub(lhs, rhs); }
|
|
|
|
|
template <> void foldOp<DfgXor> (V3Number& out, const V3Number& lhs, const V3Number& rhs) { out.opXor(lhs, rhs); }
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
// clang-format on
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
class V3DfgPeephole final : public DfgVisitor {
|
2026-03-23 15:41:36 +01:00
|
|
|
// TYPES
|
|
|
|
|
struct VertexInfo final {
|
2026-03-23 16:06:07 +01:00
|
|
|
size_t m_workListIndex = 0; // Position of this vertx m_wlist (0 means not in list)
|
|
|
|
|
size_t m_generation = 0; // Generation number of this vertex
|
2026-03-23 15:41:36 +01:00
|
|
|
};
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// STATE
|
|
|
|
|
DfgGraph& m_dfg; // The DfgGraph being visited
|
|
|
|
|
V3DfgPeepholeContext& m_ctx; // The config structure
|
2025-09-07 21:38:50 +02:00
|
|
|
const DfgDataType& m_bitDType = DfgDataType::packed(1); // Common, so grab it up front
|
2026-03-23 15:41:36 +01:00
|
|
|
std::vector<DfgVertex*> m_wlist; // Using a manual work list as need special operations
|
|
|
|
|
DfgUserMap<VertexInfo> m_vInfo = m_dfg.makeUserMap<VertexInfo>(); // Map to VertexInfo
|
2026-03-23 10:09:47 +01:00
|
|
|
V3DfgCache m_cache{m_dfg}; // Vertex cache to avoid creating redundant vertices
|
|
|
|
|
DfgVertex* m_vtxp = nullptr; // Currently considered vertex
|
2026-03-23 16:06:07 +01:00
|
|
|
size_t m_currentGeneration = 0; // Current generation number
|
2025-09-03 14:55:33 +02:00
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
// STATIC STATE
|
|
|
|
|
static V3DebugBisect s_debugBisect; // Debug aid
|
2026-03-21 11:13:27 +01:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
#define APPLYING(id) if (checkApplying(VDfgPeepholePattern::id))
|
|
|
|
|
|
|
|
|
|
// METHODS
|
|
|
|
|
bool checkApplying(VDfgPeepholePattern id) {
|
2026-03-21 11:13:27 +01:00
|
|
|
if (VL_UNLIKELY(!m_ctx.m_enabled[id] || s_debugBisect.isStopped())) return false;
|
2025-05-23 02:29:32 +02:00
|
|
|
UINFO(9, "Applying DFG pattern " << id.ascii());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
++m_ctx.m_count[id];
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
void incrementGeneration() {
|
|
|
|
|
++m_currentGeneration;
|
|
|
|
|
// TODO: could sweep on overflow
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 15:41:36 +01:00
|
|
|
void addToWorkList(DfgVertex* vtxp) {
|
|
|
|
|
VertexInfo& vInfo = m_vInfo[*vtxp];
|
|
|
|
|
// If already on work list, ignore
|
|
|
|
|
if (vInfo.m_workListIndex) return;
|
|
|
|
|
// Add to work list
|
|
|
|
|
vInfo.m_workListIndex = m_wlist.size();
|
|
|
|
|
m_wlist.push_back(vtxp);
|
|
|
|
|
}
|
2022-11-04 16:50:20 +01:00
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
void removeFromWorkList(DfgVertex* vtxp) {
|
|
|
|
|
VertexInfo& vInfo = m_vInfo[*vtxp];
|
|
|
|
|
// m_wlist[0] is always nullptr, fine to assign same here
|
|
|
|
|
m_wlist[vInfo.m_workListIndex] = nullptr;
|
|
|
|
|
vInfo.m_workListIndex = 0;
|
|
|
|
|
}
|
|
|
|
|
|
2022-11-04 16:50:20 +01:00
|
|
|
void addSourcesToWorkList(DfgVertex* vtxp) {
|
2025-09-02 17:50:40 +02:00
|
|
|
vtxp->foreachSource([&](DfgVertex& src) {
|
|
|
|
|
addToWorkList(&src);
|
|
|
|
|
return false;
|
|
|
|
|
});
|
2022-11-04 16:50:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void addSinksToWorkList(DfgVertex* vtxp) {
|
2025-09-02 17:50:40 +02:00
|
|
|
vtxp->foreachSink([&](DfgVertex& src) {
|
|
|
|
|
addToWorkList(&src);
|
|
|
|
|
return false;
|
|
|
|
|
});
|
2022-11-04 16:50:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
void deleteVertex(DfgVertex* vtxp) {
|
2026-03-23 15:41:36 +01:00
|
|
|
UASSERT_OBJ(!m_vInfo[vtxp].m_workListIndex, vtxp, "Deleted Vertex is in work list");
|
2026-03-21 23:45:02 +01:00
|
|
|
UASSERT_OBJ(!vtxp->hasSinks(), vtxp, "Should not delete used vertex");
|
2026-03-23 16:06:07 +01:00
|
|
|
|
|
|
|
|
// Invalidate cache entry
|
|
|
|
|
m_cache.invalidate(vtxp);
|
|
|
|
|
|
|
|
|
|
// Gather source vertices - they might be duplicates, make unique using generation number
|
|
|
|
|
incrementGeneration();
|
|
|
|
|
std::vector<DfgVertex*> srcps;
|
|
|
|
|
srcps.reserve(vtxp->nInputs());
|
|
|
|
|
vtxp->foreachSource([&](DfgVertex& src) {
|
|
|
|
|
// If it's a variable, add to work list to see if it became redundant
|
|
|
|
|
if (src.is<DfgVertexVar>()) {
|
|
|
|
|
addToWorkList(&src);
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
// Gather unique sources
|
|
|
|
|
VertexInfo& vInfo = m_vInfo[src];
|
|
|
|
|
if (vInfo.m_generation != m_currentGeneration) srcps.push_back(&src);
|
|
|
|
|
vInfo.m_generation = m_currentGeneration;
|
|
|
|
|
return false;
|
|
|
|
|
});
|
|
|
|
|
|
2026-02-14 13:33:20 +01:00
|
|
|
// This pass only removes variables that are either not driven in this graph,
|
|
|
|
|
// or are not observable outside the graph. If there is also no external write
|
|
|
|
|
// to the variable and no references in other graph then delete the Ast var too.
|
|
|
|
|
const DfgVertexVar* const varp = vtxp->cast<DfgVertexVar>();
|
2026-03-23 16:06:07 +01:00
|
|
|
if (varp && !varp->isVolatile() && !varp->hasDfgRefs()) {
|
|
|
|
|
AstNode* const nodep = varp->nodep();
|
|
|
|
|
VL_DO_DANGLING(vtxp->unlinkDelete(m_dfg), vtxp);
|
|
|
|
|
VL_DO_DANGLING(nodep->unlinkFrBack()->deleteTree(), nodep);
|
|
|
|
|
} else {
|
|
|
|
|
VL_DO_DANGLING(vtxp->unlinkDelete(m_dfg), vtxp);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Partition sources into used/unused after removing the vertex
|
|
|
|
|
const auto mid
|
|
|
|
|
= std::stable_partition(srcps.begin(), srcps.end(), [](DfgVertex* srcp) { //
|
|
|
|
|
return srcp->hasSinks();
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// Add used sources to the work list
|
|
|
|
|
for (auto it = srcps.begin(); it != mid; ++it) addToWorkList(*it);
|
|
|
|
|
|
|
|
|
|
// Recursively delete unused sources
|
|
|
|
|
for (auto it = mid; it != srcps.end(); ++it) {
|
|
|
|
|
// Remove from work list
|
|
|
|
|
removeFromWorkList(*it);
|
|
|
|
|
// Delete vertex
|
|
|
|
|
deleteVertex(*it);
|
|
|
|
|
}
|
2022-11-04 16:50:20 +01:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
// Replace 'm_vtxp' with the given vertex
|
|
|
|
|
void replace(DfgVertex* resp) {
|
|
|
|
|
// Should not be in the work list
|
2026-03-23 15:41:36 +01:00
|
|
|
UASSERT_OBJ(!m_vInfo[m_vtxp].m_workListIndex, m_vtxp, "Replaced Vertex is in work list");
|
2026-03-23 10:09:47 +01:00
|
|
|
|
|
|
|
|
// Debug bisect check
|
2026-03-21 11:13:27 +01:00
|
|
|
const auto debugCallback = [&]() -> void {
|
2026-03-23 10:09:47 +01:00
|
|
|
UINFO(0, "Problematic DfgPeephole replacement: " << m_vtxp << " -> " << resp);
|
|
|
|
|
m_dfg.sourceCone({m_vtxp, resp});
|
|
|
|
|
const auto cone = m_dfg.sourceCone({m_vtxp, resp});
|
2026-03-21 11:13:27 +01:00
|
|
|
m_dfg.dumpDotFilePrefixed("peephole-broken", [&](const DfgVertex& v) { //
|
|
|
|
|
return cone->count(&v);
|
|
|
|
|
});
|
|
|
|
|
};
|
|
|
|
|
if (VL_UNLIKELY(s_debugBisect.stop(debugCallback))) return;
|
|
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
// Remove sinks from cache, their inputs are about to change
|
|
|
|
|
m_vtxp->foreachSink([&](DfgVertex& dst) {
|
|
|
|
|
m_cache.invalidate(&dst);
|
2025-09-02 17:50:40 +02:00
|
|
|
return false;
|
|
|
|
|
});
|
2026-03-21 23:45:02 +01:00
|
|
|
// Replace vertex with the replacement
|
2026-03-23 10:09:47 +01:00
|
|
|
m_vtxp->replaceWith(resp);
|
|
|
|
|
// Re-cache all sinks of the replacement
|
2026-03-23 16:06:07 +01:00
|
|
|
resp->foreachSink([&](DfgVertex& dst) {
|
|
|
|
|
m_cache.cache(&dst);
|
2026-03-23 10:09:47 +01:00
|
|
|
return false;
|
|
|
|
|
});
|
2026-03-23 16:06:07 +01:00
|
|
|
|
|
|
|
|
// Original vertex is now unused, so delete it
|
2026-03-23 10:09:47 +01:00
|
|
|
deleteVertex(m_vtxp);
|
2026-03-23 16:06:07 +01:00
|
|
|
|
|
|
|
|
// Add all sources to the work list
|
|
|
|
|
addSourcesToWorkList(resp);
|
|
|
|
|
// Add replacement to the work list
|
|
|
|
|
addToWorkList(resp);
|
|
|
|
|
// Add sinks of replaced vertex to the work list
|
|
|
|
|
addSinksToWorkList(resp);
|
2022-11-04 16:50:20 +01:00
|
|
|
}
|
|
|
|
|
|
2022-10-07 16:44:14 +02:00
|
|
|
// Create a 32-bit DfgConst vertex
|
|
|
|
|
DfgConst* makeI32(FileLine* flp, uint32_t val) { return new DfgConst{m_dfg, flp, 32, val}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-07 16:44:14 +02:00
|
|
|
// Create a DfgConst vertex with the given width and value zero
|
2025-09-02 17:50:40 +02:00
|
|
|
DfgConst* makeZero(FileLine* flp, uint32_t width) {
|
|
|
|
|
return new DfgConst{m_dfg, flp, width, 0};
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-11-04 16:50:20 +01:00
|
|
|
// Create a new vertex of the given type
|
2024-03-06 19:01:52 +01:00
|
|
|
template <typename Vertex, typename... Operands>
|
2025-09-07 21:38:50 +02:00
|
|
|
Vertex* make(FileLine* flp, const DfgDataType& dtype, Operands... operands) {
|
2024-03-06 19:01:52 +01:00
|
|
|
// Find or create an equivalent vertex
|
2025-09-07 21:38:50 +02:00
|
|
|
Vertex* const vtxp = m_cache.getOrCreate<Vertex, Operands...>(flp, dtype, operands...);
|
2026-03-21 11:15:18 +01:00
|
|
|
// Sanity check
|
|
|
|
|
UASSERT_OBJ(vtxp->dtype() == dtype, vtxp, "Vertex dtype mismatch");
|
|
|
|
|
if (VL_UNLIKELY(v3Global.opt.debugCheck())) vtxp->typeCheck(m_dfg);
|
2025-09-02 23:21:24 +02:00
|
|
|
// Add to work list
|
|
|
|
|
addToWorkList(vtxp);
|
2022-11-04 16:50:20 +01:00
|
|
|
// Return new node
|
|
|
|
|
return vtxp;
|
|
|
|
|
}
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
// Same as above, but 'flp' and 'dtypep' are taken from the given example vertex
|
|
|
|
|
template <typename Vertex, typename... Operands>
|
2025-08-20 19:21:24 +02:00
|
|
|
Vertex* make(const DfgVertex* examplep, Operands... operands) {
|
2025-09-07 21:38:50 +02:00
|
|
|
return make<Vertex>(examplep->fileline(), examplep->dtype(), operands...);
|
2024-03-06 19:01:52 +01:00
|
|
|
}
|
|
|
|
|
|
2025-07-07 17:25:29 +02:00
|
|
|
// Check two vertex are the same, or the same constant value
|
|
|
|
|
static bool isSame(const DfgVertex* ap, const DfgVertex* bp) {
|
|
|
|
|
if (ap == bp) return true;
|
|
|
|
|
const DfgConst* const aConstp = ap->cast<DfgConst>();
|
|
|
|
|
if (!aConstp) return false;
|
|
|
|
|
const DfgConst* const bConstp = bp->cast<DfgConst>();
|
|
|
|
|
if (!bConstp) return false;
|
|
|
|
|
return aConstp->num().isCaseEq(bConstp->num());
|
|
|
|
|
}
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
static bool isZero(const DfgVertex* vtxp) {
|
|
|
|
|
if (const DfgConst* const constp = vtxp->cast<DfgConst>()) return constp->isZero();
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static bool isOnes(const DfgVertex* vtxp) {
|
|
|
|
|
if (const DfgConst* const constp = vtxp->cast<DfgConst>()) return constp->isOnes();
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
// Note: If any of the following transformers return true, then the vertex was replaced and the
|
|
|
|
|
// caller must not do any further changes, so the caller must check the return value, otherwise
|
|
|
|
|
// there will be hard to debug issues.
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
// Constant fold unary vertex, return true if folded
|
|
|
|
|
template <typename Vertex>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool foldUnary(Vertex* const vtxp) {
|
2022-10-04 22:18:39 +02:00
|
|
|
static_assert(std::is_base_of<DfgVertexUnary, Vertex>::value, "Must invoke on unary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
if (DfgConst* const srcp = vtxp->srcp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_UNARY) {
|
|
|
|
|
DfgConst* const resultp = makeZero(vtxp->fileline(), vtxp->width());
|
|
|
|
|
foldOp<Vertex>(resultp->num(), srcp->num());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(resultp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Constant fold binary vertex, return true if folded
|
|
|
|
|
template <typename Vertex>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool foldBinary(Vertex* const vtxp) {
|
2022-10-04 22:18:39 +02:00
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Vertex>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
2025-09-02 17:50:40 +02:00
|
|
|
if (DfgConst* const lhsp = vtxp->inputp(0)->template cast<DfgConst>()) {
|
|
|
|
|
if (DfgConst* const rhsp = vtxp->inputp(1)->template cast<DfgConst>()) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(FOLD_BINARY) {
|
|
|
|
|
DfgConst* const resultp = makeZero(vtxp->fileline(), vtxp->width());
|
|
|
|
|
foldOp<Vertex>(resultp->num(), lhsp->num(), rhsp->num());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(resultp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
// Transformations that apply to all associative binary vertices.
|
|
|
|
|
// Returns true if vtxp was replaced.
|
|
|
|
|
template <typename Vertex>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool associativeBinary(Vertex* const vtxp) {
|
2022-10-05 11:36:28 +02:00
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Vertex>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
DfgConst* const lConstp = lhsp->cast<DfgConst>();
|
|
|
|
|
DfgConst* const rConstp = rhsp->cast<DfgConst>();
|
|
|
|
|
|
|
|
|
|
if (lConstp && rConstp) {
|
|
|
|
|
APPLYING(FOLD_ASSOC_BINARY) {
|
|
|
|
|
DfgConst* const resultp = makeZero(flp, vtxp->width());
|
|
|
|
|
foldOp<Vertex>(resultp->num(), lConstp->num(), rConstp->num());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(resultp);
|
2022-10-05 11:36:28 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (lConstp) {
|
|
|
|
|
if (Vertex* const rVtxp = rhsp->cast<Vertex>()) {
|
|
|
|
|
if (DfgConst* const rlConstp = rVtxp->lhsp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_ASSOC_BINARY_LHS_OF_RHS) {
|
|
|
|
|
// Fold constants
|
|
|
|
|
const uint32_t width = std::is_same<DfgConcat, Vertex>::value
|
|
|
|
|
? lConstp->width() + rlConstp->width()
|
|
|
|
|
: vtxp->width();
|
|
|
|
|
DfgConst* const constp = makeZero(flp, width);
|
|
|
|
|
foldOp<Vertex>(constp->num(), lConstp->num(), rlConstp->num());
|
|
|
|
|
|
|
|
|
|
// Replace vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Vertex>(vtxp, constp, rVtxp->rhsp()));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (rConstp) {
|
|
|
|
|
if (Vertex* const lVtxp = lhsp->cast<Vertex>()) {
|
|
|
|
|
if (DfgConst* const lrConstp = lVtxp->rhsp()->template cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_ASSOC_BINARY_RHS_OF_LHS) {
|
|
|
|
|
// Fold constants
|
|
|
|
|
const uint32_t width = std::is_same<DfgConcat, Vertex>::value
|
|
|
|
|
? lrConstp->width() + rConstp->width()
|
|
|
|
|
: vtxp->width();
|
|
|
|
|
DfgConst* const constp = makeZero(flp, width);
|
|
|
|
|
foldOp<Vertex>(constp->num(), lrConstp->num(), rConstp->num());
|
|
|
|
|
|
|
|
|
|
// Replace vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Vertex>(vtxp, lVtxp->lhsp(), constp));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Make associative trees right leaning to reduce pattern variations, and for better CSE
|
2026-03-23 10:09:47 +01:00
|
|
|
if (Vertex* const alhsp = vtxp->lhsp()->template cast<Vertex>()) {
|
|
|
|
|
if (!alhsp->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(RIGHT_LEANING_ASSOC) {
|
|
|
|
|
// Rotate the expression tree rooted at 'vtxp' to the right, producing a
|
|
|
|
|
// right-leaning tree
|
|
|
|
|
DfgVertex* const ap = alhsp->lhsp();
|
|
|
|
|
DfgVertex* const bp = alhsp->rhsp();
|
|
|
|
|
DfgVertex* const cp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
// Concatenation dtypes need adjusting, other assoc vertices preserve types
|
|
|
|
|
const DfgDataType& childDType
|
|
|
|
|
= std::is_same<Vertex, DfgConcat>::value
|
|
|
|
|
? DfgDataType::packed(bp->width() + cp->width())
|
|
|
|
|
: vtxp->dtype();
|
|
|
|
|
|
|
|
|
|
Vertex* const bcp = make<Vertex>(vtxp->fileline(), childDType, bp, cp);
|
|
|
|
|
replace(make<Vertex>(alhsp->fileline(), vtxp->dtype(), ap, bcp));
|
|
|
|
|
return true;
|
|
|
|
|
}
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
|
|
|
|
}
|
2026-03-17 09:20:00 +01:00
|
|
|
|
|
|
|
|
// if (a OP (b OP c)), check if (a OP b) exists and if so replace with (a OP b) OP c
|
|
|
|
|
if (Vertex* rVtxp = vtxp->rhsp()->template cast<Vertex>()) {
|
|
|
|
|
const DfgDataType& dtype
|
|
|
|
|
= std::is_same<Vertex, DfgConcat>::value
|
|
|
|
|
? DfgDataType::packed(lhsp->width() + rVtxp->lhsp()->width())
|
|
|
|
|
: vtxp->dtype();
|
|
|
|
|
if (Vertex* const cVtxp = m_cache.get<Vertex>(dtype, lhsp, rVtxp->lhsp())) {
|
|
|
|
|
if (cVtxp->hasSinks() && cVtxp != rhsp) {
|
|
|
|
|
APPLYING(REUSE_ASSOC_BINARY) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Vertex>(vtxp, cVtxp, rVtxp->rhsp()));
|
2026-03-17 09:20:00 +01:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-05 11:36:28 +02:00
|
|
|
|
2026-03-17 09:20:00 +01:00
|
|
|
return false;
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Transformations that apply to all commutative binary vertices
|
2024-03-06 19:01:52 +01:00
|
|
|
template <typename Vertex>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool commutativeBinary(Vertex* const vtxp) {
|
2024-03-06 19:01:52 +01:00
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Vertex>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Vertex>::value, "Must invoke on final class");
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Ensure Const is on left-hand side to simplify other patterns
|
2024-03-06 19:01:52 +01:00
|
|
|
if (lhsp->is<DfgConst>()) return false;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (rhsp->is<DfgConst>()) {
|
|
|
|
|
APPLYING(SWAP_CONST_IN_COMMUTATIVE_BINARY) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Vertex>(vtxp, rhsp, lhsp));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// Ensure Not is on the left-hand side to simplify other patterns
|
2024-03-06 19:01:52 +01:00
|
|
|
if (lhsp->is<DfgNot>()) return false;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (rhsp->is<DfgNot>()) {
|
|
|
|
|
APPLYING(SWAP_NOT_IN_COMMUTATIVE_BINARY) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Vertex>(vtxp, rhsp, lhsp));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
2024-03-06 19:01:52 +01:00
|
|
|
// If both sides are variable references, order the side in some defined way. This
|
|
|
|
|
// allows CSE to later merge 'a op b' with 'b op a'.
|
2022-10-04 12:03:41 +02:00
|
|
|
if (lhsp->is<DfgVertexVar>() && rhsp->is<DfgVertexVar>()) {
|
2025-08-20 19:21:24 +02:00
|
|
|
const AstNode* const lVarp = lhsp->as<DfgVertexVar>()->nodep();
|
|
|
|
|
const AstNode* const rVarp = rhsp->as<DfgVertexVar>()->nodep();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (lVarp->name() > rVarp->name()) {
|
|
|
|
|
APPLYING(SWAP_VAR_IN_COMMUTATIVE_BINARY) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Vertex>(vtxp, rhsp, lhsp));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2024-03-06 19:01:52 +01:00
|
|
|
|
|
|
|
|
return false;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2025-07-07 17:25:29 +02:00
|
|
|
// Transformations that apply to all distributive and associative binary
|
|
|
|
|
// vertices 'Other' is the type that is distributive over 'Vertex',
|
|
|
|
|
// that is: a Other (b Vertex c) == (a Other b) Vertex (a Other c)
|
|
|
|
|
template <typename Other, typename Vertex>
|
|
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool distributiveAndAssociativeBinary(Vertex* vtxp) {
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
if (!lhsp->hasMultipleSinks() && !rhsp->hasMultipleSinks()) {
|
|
|
|
|
// Convert '(a Other b) Vertex (a Other c)' and associative
|
|
|
|
|
// variations to 'a Other (b Vertex c)'
|
|
|
|
|
if (Other* const lp = lhsp->cast<Other>()) {
|
|
|
|
|
if (Other* const rp = rhsp->cast<Other>()) {
|
|
|
|
|
DfgVertex* const llp = lp->lhsp();
|
|
|
|
|
DfgVertex* const lrp = lp->rhsp();
|
|
|
|
|
DfgVertex* const rlp = rp->lhsp();
|
|
|
|
|
DfgVertex* const rrp = rp->rhsp();
|
|
|
|
|
DfgVertex* ap = nullptr;
|
|
|
|
|
DfgVertex* bp = nullptr;
|
|
|
|
|
DfgVertex* cp = nullptr;
|
|
|
|
|
if (llp == rlp) {
|
|
|
|
|
ap = llp;
|
|
|
|
|
bp = lrp;
|
|
|
|
|
cp = rrp;
|
|
|
|
|
} else if (llp == rrp) {
|
|
|
|
|
ap = llp;
|
|
|
|
|
bp = lrp;
|
|
|
|
|
cp = rlp;
|
|
|
|
|
} else if (lrp == rlp) {
|
|
|
|
|
ap = lrp;
|
|
|
|
|
bp = llp;
|
|
|
|
|
cp = rrp;
|
|
|
|
|
} else if (lrp == rrp) {
|
|
|
|
|
ap = lrp;
|
|
|
|
|
bp = llp;
|
|
|
|
|
cp = rlp;
|
|
|
|
|
}
|
|
|
|
|
if (ap) {
|
|
|
|
|
APPLYING(REPLACE_DISTRIBUTIVE_BINARY) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Other>(vtxp, ap, make<Vertex>(lhsp, bp, cp)));
|
2025-07-07 17:25:29 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
// Bitwise operation with one side Const, and the other side a Concat
|
|
|
|
|
template <typename Vertex>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool
|
|
|
|
|
tryPushBitwiseOpThroughConcat(Vertex* const vtxp, DfgConst* constp, DfgConcat* concatp) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
// If at least one of the sides of the Concat constant, or width 1 (i.e.: can be
|
|
|
|
|
// further simplified), then push the Vertex past the Concat
|
|
|
|
|
if (concatp->lhsp()->is<DfgConst>() || concatp->rhsp()->is<DfgConst>() //
|
2025-09-07 21:38:50 +02:00
|
|
|
|| concatp->lhsp()->dtype() == m_bitDType || concatp->rhsp()->dtype() == m_bitDType) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(PUSH_BITWISE_OP_THROUGH_CONCAT) {
|
|
|
|
|
const uint32_t width = concatp->width();
|
2025-09-07 21:38:50 +02:00
|
|
|
const DfgDataType& lDtype = concatp->lhsp()->dtype();
|
|
|
|
|
const DfgDataType& rDtype = concatp->rhsp()->dtype();
|
|
|
|
|
const uint32_t lWidth = lDtype.size();
|
|
|
|
|
const uint32_t rWidth = rDtype.size();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The new Lhs vertex
|
|
|
|
|
DfgConst* const newLhsConstp = makeZero(constp->fileline(), lWidth);
|
|
|
|
|
newLhsConstp->num().opSel(constp->num(), width - 1, rWidth);
|
2025-09-07 21:38:50 +02:00
|
|
|
Vertex* const newLhsp = make<Vertex>(flp, lDtype, newLhsConstp, concatp->lhsp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The new Rhs vertex
|
|
|
|
|
DfgConst* const newRhsConstp = makeZero(constp->fileline(), rWidth);
|
|
|
|
|
newRhsConstp->num().opSel(constp->num(), rWidth - 1, 0);
|
2025-09-07 21:38:50 +02:00
|
|
|
Vertex* const newRhsp = make<Vertex>(flp, rDtype, newRhsConstp, concatp->rhsp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Replace this vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgConcat>(concatp, newLhsp, newRhsp));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename Vertex>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool
|
|
|
|
|
tryPushCompareOpThroughConcat(Vertex* const vtxp, DfgConst* constp, DfgConcat* concatp) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
// If at least one of the sides of the Concat is constant, then push the Vertex past
|
|
|
|
|
// the Concat
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (concatp->lhsp()->is<DfgConst>() || concatp->rhsp()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_COMPARE_OP_THROUGH_CONCAT) {
|
|
|
|
|
const uint32_t width = concatp->width();
|
|
|
|
|
const uint32_t lWidth = concatp->lhsp()->width();
|
|
|
|
|
const uint32_t rWidth = concatp->rhsp()->width();
|
|
|
|
|
|
|
|
|
|
// The new Lhs vertex
|
|
|
|
|
DfgConst* const newLhsConstp = makeZero(constp->fileline(), lWidth);
|
|
|
|
|
newLhsConstp->num().opSel(constp->num(), width - 1, rWidth);
|
2024-03-06 19:01:52 +01:00
|
|
|
Vertex* const newLhsp
|
|
|
|
|
= make<Vertex>(flp, m_bitDType, newLhsConstp, concatp->lhsp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The new Rhs vertex
|
|
|
|
|
DfgConst* const newRhsConstp = makeZero(constp->fileline(), rWidth);
|
|
|
|
|
newRhsConstp->num().opSel(constp->num(), rWidth - 1, 0);
|
2024-03-06 19:01:52 +01:00
|
|
|
Vertex* const newRhsp
|
|
|
|
|
= make<Vertex>(flp, m_bitDType, newRhsConstp, concatp->rhsp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The replacement Vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgVertexBinary* const resp
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
= std::is_same<Vertex, DfgEq>::value
|
2024-03-06 19:01:52 +01:00
|
|
|
? make<DfgAnd>(concatp->fileline(), m_bitDType, newLhsp, newRhsp)
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
: nullptr;
|
2026-03-23 10:09:47 +01:00
|
|
|
UASSERT_OBJ(resp, vtxp,
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
"Unhandled vertex type in 'tryPushCompareOpThroughConcat': "
|
|
|
|
|
<< vtxp->typeName());
|
|
|
|
|
|
|
|
|
|
// Replace this vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(resp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
template <typename Bitwise>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool tryPushBitwiseOpThroughReductions(Bitwise* const vtxp) {
|
2022-09-30 17:19:53 +02:00
|
|
|
using Reduction = BitwiseToReduction<Bitwise>;
|
|
|
|
|
|
|
|
|
|
if (Reduction* const lRedp = vtxp->lhsp()->template cast<Reduction>()) {
|
|
|
|
|
if (Reduction* const rRedp = vtxp->rhsp()->template cast<Reduction>()) {
|
|
|
|
|
DfgVertex* const lSrcp = lRedp->srcp();
|
|
|
|
|
DfgVertex* const rSrcp = rRedp->srcp();
|
2025-09-07 21:38:50 +02:00
|
|
|
if (lSrcp->dtype() == rSrcp->dtype() && lSrcp->width() <= 64
|
2022-10-04 19:26:53 +02:00
|
|
|
&& !lSrcp->hasMultipleSinks() && !rSrcp->hasMultipleSinks()) {
|
2022-09-30 17:19:53 +02:00
|
|
|
APPLYING(PUSH_BITWISE_THROUGH_REDUCTION) {
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
2025-09-07 21:38:50 +02:00
|
|
|
Bitwise* const bwp = make<Bitwise>(flp, lSrcp->dtype(), lSrcp, rSrcp);
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Reduction>(flp, m_bitDType, bwp));
|
2022-09-30 17:19:53 +02:00
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
template <typename Reduction>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool optimizeReduction(Reduction* const vtxp) {
|
2022-09-30 17:19:53 +02:00
|
|
|
using Bitwise = ReductionToBitwise<Reduction>;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (foldUnary(vtxp)) return true;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const srcp = vtxp->srcp();
|
2022-09-30 17:19:53 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
// Reduction of 1-bit value
|
2025-09-07 21:38:50 +02:00
|
|
|
if (srcp->dtype() == m_bitDType) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REMOVE_WIDTH_ONE_REDUCTION) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(srcp);
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
if (DfgCond* const condp = srcp->cast<DfgCond>()) {
|
|
|
|
|
if (condp->thenp()->is<DfgConst>() || condp->elsep()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_REDUCTION_THROUGH_COND_WITH_CONST_BRANCH) {
|
|
|
|
|
// The new 'then' vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
Reduction* const newThenp = make<Reduction>(flp, m_bitDType, condp->thenp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The new 'else' vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
Reduction* const newElsep = make<Reduction>(flp, m_bitDType, condp->elsep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The replacement Cond vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(condp->fileline(), m_bitDType,
|
|
|
|
|
condp->condp(), newThenp, newElsep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Replace this vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(newCondp);
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (DfgConcat* const concatp = srcp->cast<DfgConcat>()) {
|
2022-10-05 23:54:36 +02:00
|
|
|
if (concatp->lhsp()->is<DfgConst>() || concatp->rhsp()->is<DfgConst>()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
APPLYING(PUSH_REDUCTION_THROUGH_CONCAT) {
|
|
|
|
|
// Reduce the parts of the concatenation
|
2024-03-06 19:01:52 +01:00
|
|
|
Reduction* const lRedp
|
|
|
|
|
= make<Reduction>(concatp->fileline(), m_bitDType, concatp->lhsp());
|
|
|
|
|
Reduction* const rRedp
|
|
|
|
|
= make<Reduction>(concatp->fileline(), m_bitDType, concatp->rhsp());
|
2022-10-04 19:26:53 +02:00
|
|
|
|
|
|
|
|
// Bitwise reduce the results
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Bitwise>(flp, m_bitDType, lRedp, rRedp));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
2022-10-04 19:26:53 +02:00
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
|
|
|
|
}
|
2024-03-06 19:01:52 +01:00
|
|
|
|
|
|
|
|
return false;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
template <typename Shift>
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_ATTR_WARN_UNUSED_RESULT bool optimizeShiftRHS(Shift* const vtxp) {
|
2024-03-06 19:01:52 +01:00
|
|
|
static_assert(std::is_base_of<DfgVertexBinary, Shift>::value, "Must invoke on binary");
|
|
|
|
|
static_assert(std::is_final<Shift>::value, "Must invoke on final class");
|
|
|
|
|
if (const DfgConcat* const concatp = vtxp->rhsp()->template cast<DfgConcat>()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(concatp->lhsp())) { // Drop redundant zero extension
|
2022-11-04 16:50:20 +01:00
|
|
|
APPLYING(REMOVE_REDUNDANT_ZEXT_ON_RHS_OF_SHIFT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<Shift>(vtxp, vtxp->lhsp(), concatp->rhsp()));
|
2024-03-06 19:01:52 +01:00
|
|
|
return true;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2024-03-06 19:01:52 +01:00
|
|
|
return false;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// VISIT methods
|
|
|
|
|
|
|
|
|
|
void visit(DfgVertex*) override {}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexUnary
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgExtend* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
// Convert all Extend into Concat with zeros. This simplifies other patterns as they
|
|
|
|
|
// only need to handle Concat, which is more generic, and don't need special cases for
|
2022-09-30 17:19:53 +02:00
|
|
|
// Extend.
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_EXTEND) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgVertex* const zerop
|
|
|
|
|
= makeZero(vtxp->fileline(), vtxp->width() - vtxp->srcp()->width());
|
|
|
|
|
replace(make<DfgConcat>(vtxp, zerop, vtxp->srcp()));
|
2022-11-04 16:50:20 +01:00
|
|
|
return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgExtendS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLogNot* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgNegate* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgNot* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldUnary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Not of Cond
|
|
|
|
|
if (DfgCond* const condp = vtxp->srcp()->cast<DfgCond>()) {
|
|
|
|
|
// If at least one of the branches are a constant, push the Not past the Cond
|
|
|
|
|
if (condp->thenp()->is<DfgConst>() || condp->elsep()->is<DfgConst>()) {
|
|
|
|
|
APPLYING(PUSH_NOT_THROUGH_COND) {
|
|
|
|
|
// The new 'then' vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgNot* const newThenp = make<DfgNot>(vtxp, condp->thenp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The new 'else' vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgNot* const newElsep = make<DfgNot>(vtxp, condp->elsep());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// The replacement Cond vertex
|
2025-09-07 21:38:50 +02:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(condp->fileline(), vtxp->dtype(),
|
2024-03-06 19:01:52 +01:00
|
|
|
condp->condp(), newThenp, newElsep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
// Replace this vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(newCondp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Not of Not
|
|
|
|
|
if (DfgNot* const notp = vtxp->srcp()->cast<DfgNot>()) {
|
|
|
|
|
APPLYING(REMOVE_NOT_NOT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(notp->srcp());
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!vtxp->srcp()->hasMultipleSinks()) {
|
|
|
|
|
// Not of Eq
|
|
|
|
|
if (DfgEq* const eqp = vtxp->srcp()->cast<DfgEq>()) {
|
|
|
|
|
APPLYING(REPLACE_NOT_EQ) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(
|
|
|
|
|
make<DfgNeq>(eqp->fileline(), vtxp->dtype(), eqp->lhsp(), eqp->rhsp()));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 19:26:53 +02:00
|
|
|
// Not of Neq
|
|
|
|
|
if (DfgNeq* const neqp = vtxp->srcp()->cast<DfgNeq>()) {
|
|
|
|
|
APPLYING(REPLACE_NOT_NEQ) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(
|
|
|
|
|
make<DfgEq>(neqp->fileline(), vtxp->dtype(), neqp->lhsp(), neqp->rhsp()));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgRedOr* const vtxp) override {
|
2024-03-06 19:01:52 +01:00
|
|
|
if (optimizeReduction(vtxp)) return;
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgRedAnd* const vtxp) override {
|
2024-03-06 19:01:52 +01:00
|
|
|
if (optimizeReduction(vtxp)) return;
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgRedXor* const vtxp) override {
|
2024-03-06 19:01:52 +01:00
|
|
|
if (optimizeReduction(vtxp)) return;
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgSel* const vtxp) override {
|
2022-10-06 19:34:18 +02:00
|
|
|
DfgVertex* const fromp = vtxp->fromp();
|
|
|
|
|
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
|
|
|
|
const uint32_t lsb = vtxp->lsb();
|
|
|
|
|
const uint32_t width = vtxp->width();
|
|
|
|
|
const uint32_t msb = lsb + width - 1;
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const constp = fromp->cast<DfgConst>()) {
|
|
|
|
|
APPLYING(FOLD_SEL) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgConst* const resp = makeZero(flp, width);
|
|
|
|
|
resp->num().opSel(constp->num(), msb, lsb);
|
|
|
|
|
replace(resp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Full width select, replace with the source.
|
|
|
|
|
if (fromp->width() == width) {
|
2024-09-09 15:09:29 +02:00
|
|
|
UASSERT_OBJ(lsb == 0, fromp, "Out of range select should have been fixed up earlier");
|
2022-10-06 19:34:18 +02:00
|
|
|
APPLYING(REMOVE_FULL_WIDTH_SEL) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(fromp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Concat
|
|
|
|
|
if (DfgConcat* const concatp = fromp->cast<DfgConcat>()) {
|
|
|
|
|
DfgVertex* const lhsp = concatp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = concatp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (msb < rhsp->width()) {
|
|
|
|
|
// If the select is entirely from rhs, then replace with sel from rhs
|
|
|
|
|
APPLYING(REMOVE_SEL_FROM_RHS_OF_CONCAT) { //
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgSel>(vtxp, rhsp, vtxp->lsb()));
|
|
|
|
|
return;
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
} else if (lsb >= rhsp->width()) {
|
|
|
|
|
// If the select is entirely from the lhs, then replace with sel from lhs
|
|
|
|
|
APPLYING(REMOVE_SEL_FROM_LHS_OF_CONCAT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgSel>(vtxp, lhsp, lsb - rhsp->width()));
|
|
|
|
|
return;
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgReplicate* const repp = fromp->cast<DfgReplicate>()) {
|
|
|
|
|
// If the Sel is wholly into the source of the Replicate, push the Sel through the
|
|
|
|
|
// Replicate and apply it directly to the source of the Replicate.
|
|
|
|
|
const uint32_t srcWidth = repp->srcp()->width();
|
|
|
|
|
if (width <= srcWidth) {
|
|
|
|
|
const uint32_t newLsb = lsb % srcWidth;
|
|
|
|
|
if (newLsb + width <= srcWidth) {
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_REPLICATE) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgSel>(vtxp, repp->srcp(), newLsb));
|
|
|
|
|
return;
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Not
|
|
|
|
|
if (DfgNot* const notp = fromp->cast<DfgNot>()) {
|
|
|
|
|
// Replace "Sel from Not" with "Not of Sel"
|
|
|
|
|
if (!notp->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_NOT) {
|
|
|
|
|
// Make Sel select from source of Not
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgSel* const newSelp = make<DfgSel>(vtxp, notp->srcp(), vtxp->lsb());
|
2022-10-06 19:34:18 +02:00
|
|
|
// Add Not after Sel
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(notp->fileline(), vtxp->dtype(), newSelp));
|
|
|
|
|
return;
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Sel
|
|
|
|
|
if (DfgSel* const selp = fromp->cast<DfgSel>()) {
|
|
|
|
|
APPLYING(REPLACE_SEL_FROM_SEL) {
|
2026-03-23 10:09:47 +01:00
|
|
|
// Select from the source of the source Sel with adjusted LSB
|
|
|
|
|
replace(make<DfgSel>(vtxp, selp->fromp(), lsb + selp->lsb()));
|
|
|
|
|
return;
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from Cond
|
|
|
|
|
if (DfgCond* const condp = fromp->cast<DfgCond>()) {
|
|
|
|
|
// If at least one of the branches are a constant, push the select past the cond
|
2025-06-08 12:12:39 +02:00
|
|
|
if (!condp->hasMultipleSinks()
|
|
|
|
|
&& (condp->thenp()->is<DfgConst>() || condp->elsep()->is<DfgConst>())) {
|
2022-10-06 19:34:18 +02:00
|
|
|
APPLYING(PUSH_SEL_THROUGH_COND) {
|
|
|
|
|
// The new 'then' vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgSel* const newThenp = make<DfgSel>(vtxp, condp->thenp(), lsb);
|
2022-10-06 19:34:18 +02:00
|
|
|
|
|
|
|
|
// The new 'else' vertex
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgSel* const newElsep = make<DfgSel>(vtxp, condp->elsep(), lsb);
|
2022-10-06 19:34:18 +02:00
|
|
|
|
|
|
|
|
// The replacement Cond vertex
|
2025-09-07 21:38:50 +02:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(condp->fileline(), vtxp->dtype(),
|
2024-03-06 19:01:52 +01:00
|
|
|
condp->condp(), newThenp, newElsep);
|
2022-10-06 19:34:18 +02:00
|
|
|
|
|
|
|
|
// Replace this vertex
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(newCondp);
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sel from ShiftL
|
|
|
|
|
if (DfgShiftL* const shiftLp = fromp->cast<DfgShiftL>()) {
|
|
|
|
|
// If selecting bottom bits of left shift, push the Sel before the shift
|
|
|
|
|
if (lsb == 0) {
|
|
|
|
|
UASSERT_OBJ(shiftLp->lhsp()->width() >= width, vtxp, "input of shift narrow");
|
|
|
|
|
APPLYING(PUSH_SEL_THROUGH_SHIFTL) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgSel* const newSelp = make<DfgSel>(vtxp, shiftLp->lhsp(), vtxp->lsb());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgShiftL>(vtxp, newSelp, shiftLp->rhsp()));
|
|
|
|
|
return;
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
// Sel from a partial variable or narrowed vertex
|
|
|
|
|
{
|
|
|
|
|
DfgSplicePacked* splicep = fromp->cast<DfgSplicePacked>();
|
|
|
|
|
if (DfgVarPacked* const varp = fromp->cast<DfgVarPacked>()) {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// Must be a splice, otherwise it would have been inlined
|
2026-03-23 10:09:47 +01:00
|
|
|
if (varp->tmpForp() && varp->srcp()) splicep = varp->srcp()->as<DfgSplicePacked>();
|
|
|
|
|
}
|
|
|
|
|
if (splicep) {
|
2026-03-23 16:06:07 +01:00
|
|
|
DfgVertex* driverp = nullptr;
|
|
|
|
|
uint32_t driverLsb = 0;
|
2025-09-02 17:50:40 +02:00
|
|
|
splicep->foreachDriver([&](DfgVertex& src, const uint32_t dLsb) {
|
|
|
|
|
const uint32_t dMsb = dLsb + src.width() - 1;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
// If it does not cover the whole searched bit range, move on
|
2025-09-02 17:50:40 +02:00
|
|
|
if (lsb < dLsb || dMsb < msb) return false;
|
2026-03-23 16:06:07 +01:00
|
|
|
// Save the driver
|
|
|
|
|
driverp = &src;
|
|
|
|
|
driverLsb = dLsb;
|
2025-09-02 17:50:40 +02:00
|
|
|
return true;
|
|
|
|
|
});
|
2026-03-23 16:06:07 +01:00
|
|
|
if (driverp) {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
APPLYING(PUSH_SEL_THROUGH_SPLICE) {
|
2026-03-23 16:06:07 +01:00
|
|
|
replace(make<DfgSel>(vtxp, driverp, lsb - driverLsb));
|
2026-02-14 13:33:20 +01:00
|
|
|
return;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-06 19:34:18 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexBinary - bitwise
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgAnd* const vtxp) override {
|
2025-07-07 17:25:29 +02:00
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
2025-09-07 21:38:50 +02:00
|
|
|
if (isSame(lhsp, rhsp)) {
|
2025-07-07 17:25:29 +02:00
|
|
|
APPLYING(REMOVE_AND_WITH_SELF) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(lhsp);
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
2025-06-15 17:00:11 +02:00
|
|
|
// Bubble pushing (De Morgan)
|
|
|
|
|
if (!lhsp->hasMultipleSinks() && !rhsp->hasMultipleSinks()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const rhsNotp = rhsp->cast<DfgNot>()) {
|
|
|
|
|
APPLYING(REPLACE_AND_OF_NOT_AND_NOT) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgOr* const orp = make<DfgOr>(vtxp, lhsNotp->srcp(), rhsNotp->srcp());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp, orp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
2022-10-04 19:26:53 +02:00
|
|
|
if (DfgNeq* const rhsNeqp = rhsp->cast<DfgNeq>()) {
|
|
|
|
|
APPLYING(REPLACE_AND_OF_NOT_AND_NEQ) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgEq* const newRhsp = make<DfgEq>(rhsp, rhsNeqp->lhsp(), rhsNeqp->rhsp());
|
|
|
|
|
DfgOr* const orp = make<DfgOr>(vtxp, lhsNotp->srcp(), newRhsp);
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp, orp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const lhsConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (lhsConstp->isZero()) {
|
|
|
|
|
APPLYING(REPLACE_AND_WITH_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(lhsConstp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (lhsConstp->isOnes()) {
|
|
|
|
|
APPLYING(REMOVE_AND_WITH_ONES) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(rhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (tryPushBitwiseOpThroughConcat(vtxp, lhsConstp, rhsConcatp)) return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-07 17:25:29 +02:00
|
|
|
if (distributiveAndAssociativeBinary<DfgOr, DfgAnd>(vtxp)) return;
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (tryPushBitwiseOpThroughReductions(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
// ~A & A is all zeroes
|
|
|
|
|
if (lhsNotp->srcp() == rhsp) {
|
|
|
|
|
APPLYING(REPLACE_CONTRADICTORY_AND) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(makeZero(flp, vtxp->width()));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-07-07 17:25:29 +02:00
|
|
|
|
|
|
|
|
// ~A & (A & _) or ~A & (_ & A) is all zeroes
|
|
|
|
|
if (DfgAnd* const rhsAndp = rhsp->cast<DfgAnd>()) {
|
|
|
|
|
if (lhsNotp->srcp() == rhsAndp->lhsp() || lhsNotp->srcp() == rhsAndp->rhsp()) {
|
|
|
|
|
APPLYING(REPLACE_CONTRADICTORY_AND_3) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(makeZero(flp, vtxp->width()));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgOr* const vtxp) override {
|
2025-07-07 17:25:29 +02:00
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
2025-09-07 21:38:50 +02:00
|
|
|
if (isSame(lhsp, rhsp)) {
|
2025-07-07 17:25:29 +02:00
|
|
|
APPLYING(REMOVE_OR_WITH_SELF) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(lhsp);
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
|
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
2025-06-15 17:00:11 +02:00
|
|
|
// Bubble pushing (De Morgan)
|
|
|
|
|
if (!lhsp->hasMultipleSinks() && !rhsp->hasMultipleSinks()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const rhsNotp = rhsp->cast<DfgNot>()) {
|
|
|
|
|
APPLYING(REPLACE_OR_OF_NOT_AND_NOT) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgAnd* const andp = make<DfgAnd>(vtxp, lhsNotp->srcp(), rhsNotp->srcp());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp, andp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-10-04 19:26:53 +02:00
|
|
|
if (DfgNeq* const rhsNeqp = rhsp->cast<DfgNeq>()) {
|
|
|
|
|
APPLYING(REPLACE_OR_OF_NOT_AND_NEQ) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgEq* const newRhsp = make<DfgEq>(rhsp, rhsNeqp->lhsp(), rhsNeqp->rhsp());
|
|
|
|
|
DfgAnd* const andp = make<DfgAnd>(vtxp, lhsNotp->srcp(), newRhsp);
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp, andp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConcat* const lhsConcatp = lhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
2025-09-07 21:38:50 +02:00
|
|
|
if (lhsConcatp->lhsp()->dtype() == rhsConcatp->lhsp()->dtype()) {
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(lhsConcatp->lhsp()) && isZero(rhsConcatp->rhsp())) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_OR_OF_CONCAT_ZERO_LHS_AND_CONCAT_RHS_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgConcat>(vtxp, rhsConcatp->lhsp(), lhsConcatp->rhsp()));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(lhsConcatp->rhsp()) && isZero(rhsConcatp->lhsp())) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_OR_OF_CONCAT_LHS_ZERO_AND_CONCAT_ZERO_RHS) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgConcat>(vtxp, lhsConcatp->lhsp(), rhsConcatp->rhsp()));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const lhsConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (lhsConstp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_OR_WITH_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(rhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (lhsConstp->isOnes()) {
|
|
|
|
|
APPLYING(REPLACE_OR_WITH_ONES) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(lhsp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (tryPushBitwiseOpThroughConcat(vtxp, lhsConstp, rhsConcatp)) return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-07 17:25:29 +02:00
|
|
|
if (distributiveAndAssociativeBinary<DfgAnd, DfgOr>(vtxp)) return;
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (tryPushBitwiseOpThroughReductions(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (DfgNot* const lhsNotp = lhsp->cast<DfgNot>()) {
|
|
|
|
|
// ~A | A is all ones
|
|
|
|
|
if (lhsNotp->srcp() == rhsp) {
|
|
|
|
|
APPLYING(REPLACE_TAUTOLOGICAL_OR) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgConst* const resp = makeZero(flp, vtxp->width());
|
|
|
|
|
resp->num().setAllBits1();
|
|
|
|
|
replace(resp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-07-07 17:25:29 +02:00
|
|
|
|
|
|
|
|
// ~A | (A | _) or ~A | (_ | A) is all ones
|
|
|
|
|
if (DfgOr* const rhsOrp = rhsp->cast<DfgOr>()) {
|
|
|
|
|
if (lhsNotp->srcp() == rhsOrp->lhsp() || lhsNotp->srcp() == rhsOrp->rhsp()) {
|
|
|
|
|
APPLYING(REPLACE_TAUTOLOGICAL_OR_3) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgConst* const resp = makeZero(flp, vtxp->width());
|
|
|
|
|
resp->num().setAllBits1();
|
|
|
|
|
replace(resp);
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgXor* const vtxp) override {
|
2025-07-07 17:25:29 +02:00
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
2025-09-07 21:38:50 +02:00
|
|
|
if (isSame(lhsp, rhsp)) {
|
2025-07-07 17:25:29 +02:00
|
|
|
APPLYING(REPLACE_XOR_WITH_SELF) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(makeZero(vtxp->fileline(), vtxp->width()));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (DfgConst* const lConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (lConstp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_XOR_WITH_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(rhsp);
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (lConstp->isOnes()) {
|
|
|
|
|
APPLYING(REPLACE_XOR_WITH_ONES) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp, rhsp));
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (DfgConcat* const rConcatp = rhsp->cast<DfgConcat>()) {
|
2024-03-06 19:01:52 +01:00
|
|
|
if (tryPushBitwiseOpThroughConcat(vtxp, lConstp, rConcatp)) return;
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-09-30 17:19:53 +02:00
|
|
|
|
|
|
|
|
if (tryPushBitwiseOpThroughReductions(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexBinary - other
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgAdd* const vtxp) override {
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgArraySel* const vtxp) override {
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
DfgConst* const idxp = vtxp->bitp()->cast<DfgConst>();
|
|
|
|
|
if (!idxp) return;
|
|
|
|
|
DfgVarArray* const varp = vtxp->fromp()->cast<DfgVarArray>();
|
|
|
|
|
if (!varp) return;
|
|
|
|
|
if (varp->varp()->isForced()) return;
|
2025-08-22 22:43:49 +02:00
|
|
|
if (varp->varp()->isSigUserRWPublic()) return;
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
DfgVertex* const srcp = varp->srcp();
|
|
|
|
|
if (!srcp) return;
|
|
|
|
|
|
|
|
|
|
if (DfgSpliceArray* const splicep = srcp->cast<DfgSpliceArray>()) {
|
|
|
|
|
DfgVertex* const driverp = splicep->driverAt(idxp->toSizeT());
|
|
|
|
|
if (!driverp) return;
|
|
|
|
|
DfgUnitArray* const uap = driverp->cast<DfgUnitArray>();
|
|
|
|
|
if (!uap) return;
|
|
|
|
|
if (uap->srcp()->is<DfgVertexSplice>()) return;
|
|
|
|
|
// If driven by a variable that had a Driver in DFG, it is partial
|
|
|
|
|
if (DfgVertexVar* const dvarp = uap->srcp()->cast<DfgVertexVar>()) {
|
|
|
|
|
if (dvarp->srcp()) return;
|
|
|
|
|
}
|
|
|
|
|
APPLYING(INLINE_ARRAYSEL_SPLICE) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(uap->srcp());
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgUnitArray* const uap = srcp->cast<DfgUnitArray>()) {
|
|
|
|
|
UASSERT_OBJ(idxp->toSizeT() == 0, vtxp, "Array index out of range");
|
|
|
|
|
if (uap->srcp()->is<DfgSplicePacked>()) return;
|
|
|
|
|
// If driven by a variable that had a Driver in DFG, it is partial
|
|
|
|
|
if (DfgVertexVar* const dvarp = uap->srcp()->cast<DfgVertexVar>()) {
|
|
|
|
|
if (dvarp->srcp()) return;
|
|
|
|
|
}
|
|
|
|
|
APPLYING(INLINE_ARRAYSEL_UNIT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(uap->srcp());
|
Optimize complex combinational logic in DFG (#6298)
This patch adds DfgLogic, which is a vertex that represents a whole,
arbitrarily complex combinational AstAlways or AstAssignW in the
DfgGraph.
Implementing this requires computing the variables live at entry to the
AstAlways (variables read by the block), so there is a new
ControlFlowGraph data structure and a classical data-flow analysis based
live variable analysis to do that at the variable level (as opposed to
bit/element level).
The actual CFG construction and live variable analysis is best effort,
and might fail for currently unhandled constructs or data types. This
can be extended later.
V3DfgAstToDfg is changed to convert the Ast into an initial DfgGraph
containing only DfgLogic, DfgVertexSplice and DfgVertexVar vertices.
The DfgLogic are then subsequently synthesized into primitive operations
by the new V3DfgSynthesize pass, which is a combination of the old
V3DfgAstToDfg conversion and new code to handle AstAlways blocks with
complex flow control.
V3DfgSynthesize by default will synthesize roughly the same constructs
as V3DfgAstToDfg used to handle before, plus any logic that is part of a
combinational cycle within the DfgGraph. This enables breaking up these
cycles, for which there are extensions to V3DfgBreakCycles in this patch
as well. V3DfgSynthesize will then delete all non synthesized or non
synthesizable DfgLogic vertices and the rest of the Dfg pipeline is
identical, with minor changes to adjust for the changed representation.
Because with this change we can now eliminate many more UNOPTFLAT, DFG
has been disabled in all the tests that specifically target testing the
scheduling and reporting of circular combinational logic.
2025-08-19 16:06:38 +02:00
|
|
|
return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgConcat* const vtxp) override {
|
2022-10-06 13:02:46 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
2022-10-04 22:18:39 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(lhsp)) {
|
2022-10-05 11:36:28 +02:00
|
|
|
DfgConst* const lConstp = lhsp->as<DfgConst>();
|
|
|
|
|
if (DfgSel* const rSelp = rhsp->cast<DfgSel>()) {
|
2025-09-07 21:38:50 +02:00
|
|
|
if (vtxp->dtype() == rSelp->fromp()->dtype() && rSelp->lsb() == lConstp->width()) {
|
2022-10-06 19:34:18 +02:00
|
|
|
APPLYING(REPLACE_CONCAT_ZERO_AND_SEL_TOP_WITH_SHIFTR) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(
|
|
|
|
|
make<DfgShiftR>(vtxp, rSelp->fromp(), makeI32(flp, lConstp->width())));
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-05 11:36:28 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(rhsp)) {
|
2022-10-05 11:36:28 +02:00
|
|
|
DfgConst* const rConstp = rhsp->as<DfgConst>();
|
|
|
|
|
if (DfgSel* const lSelp = lhsp->cast<DfgSel>()) {
|
2025-09-07 21:38:50 +02:00
|
|
|
if (vtxp->dtype() == lSelp->fromp()->dtype() && lSelp->lsb() == 0) {
|
2022-10-06 19:34:18 +02:00
|
|
|
APPLYING(REPLACE_CONCAT_SEL_BOTTOM_AND_ZERO_WITH_SHIFTL) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(
|
|
|
|
|
make<DfgShiftL>(vtxp, lSelp->fromp(), makeI32(flp, rConstp->width())));
|
2022-10-06 19:34:18 +02:00
|
|
|
return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-10-05 11:36:28 +02:00
|
|
|
if (DfgNot* const lNot = lhsp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const rNot = rhsp->cast<DfgNot>()) {
|
|
|
|
|
if (!lNot->hasMultipleSinks() && !rNot->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(PUSH_CONCAT_THROUGH_NOTS) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgConcat* const newCatp
|
|
|
|
|
= make<DfgConcat>(vtxp, lNot->srcp(), rNot->srcp());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp, newCatp));
|
2022-10-05 11:36:28 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
if (DfgSel* const lSelp = lhsp->cast<DfgSel>()) {
|
|
|
|
|
if (DfgSel* const rSelp = rhsp->cast<DfgSel>()) {
|
2025-09-03 11:26:47 +02:00
|
|
|
if (isSame(lSelp->fromp(), rSelp->fromp())) {
|
2022-10-06 19:34:18 +02:00
|
|
|
if (lSelp->lsb() == rSelp->lsb() + rSelp->width()) {
|
2026-03-23 16:06:07 +01:00
|
|
|
APPLYING(REMOVE_CONCAT_OF_ADJOINING_SELS) {
|
|
|
|
|
replace(make<DfgSel>(vtxp, rSelp->fromp(), rSelp->lsb()));
|
|
|
|
|
return;
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2026-03-23 16:06:07 +01:00
|
|
|
|
|
|
|
|
if (DfgConcat* const rConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (DfgSel* const rlSelp = rConcatp->lhsp()->cast<DfgSel>()) {
|
|
|
|
|
if (isSame(lSelp->fromp(), rlSelp->fromp())) {
|
|
|
|
|
if (lSelp->lsb() == rlSelp->lsb() + rlSelp->width()) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(REPLACE_NESTED_CONCAT_OF_ADJOINING_SELS_ON_LHS) {
|
2026-03-23 16:06:07 +01:00
|
|
|
const uint32_t width = lSelp->width() + rlSelp->width();
|
|
|
|
|
const DfgDataType& dtype = DfgDataType::packed(width);
|
|
|
|
|
DfgSel* const selp
|
|
|
|
|
= make<DfgSel>(flp, dtype, rlSelp->fromp(), rlSelp->lsb());
|
|
|
|
|
replace(make<DfgConcat>(vtxp, selp, rConcatp->rhsp()));
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2026-03-23 16:06:07 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgSel* const rSelp = rhsp->cast<DfgSel>()) {
|
|
|
|
|
if (DfgConcat* const lConcatp = lhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (DfgSel* const lrSelp = lConcatp->rhsp()->cast<DfgSel>()) {
|
|
|
|
|
if (isSame(lrSelp->fromp(), rSelp->fromp())) {
|
|
|
|
|
if (lrSelp->lsb() == rSelp->lsb() + rSelp->width()) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(REPLACE_NESTED_CONCAT_OF_ADJOINING_SELS_ON_RHS) {
|
2026-03-23 16:06:07 +01:00
|
|
|
const uint32_t width = lrSelp->width() + rSelp->width();
|
|
|
|
|
const DfgDataType& dtype = DfgDataType::packed(width);
|
|
|
|
|
DfgSel* const selp
|
|
|
|
|
= make<DfgSel>(flp, dtype, rSelp->fromp(), rSelp->lsb());
|
|
|
|
|
replace(make<DfgConcat>(vtxp, lConcatp->lhsp(), selp));
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2026-01-31 18:28:14 +01:00
|
|
|
|
2026-02-14 13:25:17 +01:00
|
|
|
if (DfgConst* const lConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (DfgCond* const rCondp = rhsp->cast<DfgCond>()) {
|
|
|
|
|
if (!rCondp->hasMultipleSinks()) {
|
|
|
|
|
DfgVertex* const rtVtxp = rCondp->thenp();
|
|
|
|
|
DfgVertex* const reVtxp = rCondp->elsep();
|
|
|
|
|
APPLYING(PUSH_CONCAT_THROUGH_COND_LHS) {
|
|
|
|
|
DfgConcat* const thenp
|
|
|
|
|
= make<DfgConcat>(rtVtxp->fileline(), vtxp->dtype(), lConstp, rtVtxp);
|
|
|
|
|
DfgConcat* const elsep
|
|
|
|
|
= make<DfgConcat>(reVtxp->fileline(), vtxp->dtype(), lConstp, reVtxp);
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgCond>(vtxp, rCondp->condp(), thenp, elsep));
|
2026-02-14 13:25:17 +01:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const rConstp = rhsp->cast<DfgConst>()) {
|
|
|
|
|
if (DfgCond* const lCondp = lhsp->cast<DfgCond>()) {
|
|
|
|
|
if (!lCondp->hasMultipleSinks()) {
|
|
|
|
|
DfgVertex* const ltVtxp = lCondp->thenp();
|
|
|
|
|
DfgVertex* const leVtxp = lCondp->elsep();
|
|
|
|
|
APPLYING(PUSH_CONCAT_THROUGH_COND_RHS) {
|
|
|
|
|
DfgConcat* const thenp
|
|
|
|
|
= make<DfgConcat>(ltVtxp->fileline(), vtxp->dtype(), ltVtxp, rConstp);
|
|
|
|
|
DfgConcat* const elsep
|
|
|
|
|
= make<DfgConcat>(leVtxp->fileline(), vtxp->dtype(), leVtxp, rConstp);
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgCond>(vtxp, lCondp->condp(), thenp, elsep));
|
2026-02-14 13:25:17 +01:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-01-31 18:28:14 +01:00
|
|
|
// Attempt to narrow a concatenation that produces unused bits on the edges
|
|
|
|
|
{
|
|
|
|
|
const uint32_t vMsb = vtxp->width() - 1; // MSB of the concatenation
|
|
|
|
|
const uint32_t lLsb = vtxp->rhsp()->width(); // LSB of the LHS
|
|
|
|
|
const uint32_t rMsb = lLsb - 1; // MSB of the RHS
|
|
|
|
|
// Check each sink, and record the range of bits used by them
|
|
|
|
|
uint32_t lsb = vMsb; // LSB used by a sink
|
|
|
|
|
uint32_t msb = 0; // MSB used by a sink
|
|
|
|
|
bool hasCrossSink = false; // True if some sinks use bits from both sides
|
|
|
|
|
vtxp->foreachSink([&](DfgVertex& sink) {
|
|
|
|
|
// Record bits used by DfgSel sinks
|
|
|
|
|
if (const DfgSel* const selp = sink.cast<DfgSel>()) {
|
|
|
|
|
const uint32_t selLsb = selp->lsb();
|
|
|
|
|
const uint32_t selMsb = selLsb + selp->width() - 1;
|
|
|
|
|
lsb = std::min(lsb, selLsb);
|
|
|
|
|
msb = std::max(msb, selMsb);
|
|
|
|
|
hasCrossSink |= selMsb >= lLsb && rMsb >= selLsb;
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2026-03-12 00:53:23 +01:00
|
|
|
// Ignore non-observable variable sinks. These will be eliminated.
|
2026-01-31 18:28:14 +01:00
|
|
|
if (const DfgVarPacked* const varp = sink.cast<DfgVarPacked>()) {
|
|
|
|
|
if (!varp->hasSinks() && !varp->isObserved()) return false;
|
|
|
|
|
}
|
|
|
|
|
// Otherwise the whole value is used
|
|
|
|
|
lsb = 0;
|
|
|
|
|
msb = vMsb;
|
|
|
|
|
return true;
|
|
|
|
|
});
|
2026-03-23 10:09:47 +01:00
|
|
|
if (hasCrossSink && (vMsb > msb || lsb > 0)) {
|
2026-01-31 18:28:14 +01:00
|
|
|
APPLYING(NARROW_CONCAT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
// Narrowed RHS
|
|
|
|
|
DfgVertex* nRhsp = rhsp;
|
|
|
|
|
if (lsb != 0)
|
|
|
|
|
nRhsp = make<DfgSel>(flp, DfgDataType::packed(rMsb - lsb + 1), rhsp, lsb);
|
|
|
|
|
// Narrowed LHS
|
|
|
|
|
DfgVertex* nLhsp = lhsp;
|
|
|
|
|
if (msb != vMsb)
|
|
|
|
|
nLhsp = make<DfgSel>(flp, DfgDataType::packed(msb - lLsb + 1), lhsp, 0U);
|
|
|
|
|
// Narrowed concatenation
|
|
|
|
|
DfgVertex* const catp
|
|
|
|
|
= make<DfgConcat>(flp, DfgDataType::packed(msb - lsb + 1), nLhsp, nRhsp);
|
|
|
|
|
|
|
|
|
|
// Need to insert via a partial splice to avoid infinite matching,
|
|
|
|
|
// this splice will be eliminated on later visits to its sinks.
|
|
|
|
|
DfgVertex::ScopeCache scopeCache;
|
|
|
|
|
DfgSplicePacked* const sp = new DfgSplicePacked{m_dfg, flp, vtxp->dtype()};
|
|
|
|
|
sp->addDriver(catp, lsb, flp);
|
|
|
|
|
replace(sp);
|
2026-01-31 18:28:14 +01:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgDiv* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgDivS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgEq* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const lhsConstp = lhsp->cast<DfgConst>()) {
|
|
|
|
|
if (DfgConcat* const rhsConcatp = rhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (tryPushCompareOpThroughConcat(vtxp, lhsConstp, rhsConcatp)) return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgGt* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgGtS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgGte* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgGteS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLogAnd* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
2025-07-07 17:25:29 +02:00
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (lhsp->width() == 1 && rhsp->width() == 1) {
|
|
|
|
|
APPLYING(REPLACE_LOGAND_WITH_AND) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgAnd>(vtxp, lhsp, rhsp));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLogEq* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLogIf* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLogOr* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
2025-07-07 17:25:29 +02:00
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (lhsp->width() == 1 && rhsp->width() == 1) {
|
|
|
|
|
APPLYING(REPLACE_LOGOR_WITH_OR) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgOr>(vtxp, lhsp, rhsp));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLt* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLtS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLte* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgLteS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgModDiv* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgModDivS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgMul* const vtxp) override {
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgMulS* const vtxp) override {
|
2022-10-05 11:36:28 +02:00
|
|
|
if (associativeBinary(vtxp)) return;
|
|
|
|
|
|
2024-03-06 19:01:52 +01:00
|
|
|
if (commutativeBinary(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgNeq* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgPow* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgPowSS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgPowSU* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgPowUS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgReplicate* const vtxp) override {
|
2025-09-07 21:38:50 +02:00
|
|
|
if (vtxp->dtype() == vtxp->srcp()->dtype()) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(REMOVE_REPLICATE_ONCE) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(vtxp->srcp());
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgShiftL* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
2024-03-06 19:01:52 +01:00
|
|
|
if (optimizeShiftRHS(vtxp)) return;
|
2026-02-14 13:25:17 +01:00
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const rConstp = rhsp->cast<DfgConst>()) {
|
|
|
|
|
if (DfgConcat* const lConcatp = lhsp->cast<DfgConcat>()) {
|
|
|
|
|
if (!lConcatp->hasMultipleSinks()
|
|
|
|
|
&& lConcatp->lhsp()->width() == rConstp->toU32()) {
|
|
|
|
|
APPLYING(REPLACE_SHIFTL_CAT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgVertex* const zerop
|
|
|
|
|
= makeZero(lConcatp->fileline(), lConcatp->lhsp()->width());
|
|
|
|
|
replace(make<DfgConcat>(vtxp, lConcatp->rhsp(), zerop));
|
2026-02-14 13:25:17 +01:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgShiftR* const lShiftRp = lhsp->cast<DfgShiftR>()) {
|
|
|
|
|
if (!lShiftRp->hasMultipleSinks() && isSame(rConstp, lShiftRp->rhsp())) {
|
2026-03-23 10:09:47 +01:00
|
|
|
if (DfgConcat* const llCatp = lShiftRp->lhsp()->cast<DfgConcat>()) {
|
2026-02-14 13:25:17 +01:00
|
|
|
const uint32_t shiftAmount = rConstp->toU32();
|
2026-03-23 10:09:47 +01:00
|
|
|
if (!llCatp->hasMultipleSinks()
|
|
|
|
|
&& llCatp->rhsp()->width() == shiftAmount) {
|
2026-02-14 13:25:17 +01:00
|
|
|
APPLYING(REPLACE_SHIFTRL_CAT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgConst* const zerop = makeZero(llCatp->fileline(), shiftAmount);
|
|
|
|
|
replace(make<DfgConcat>(vtxp, llCatp->lhsp(), zerop));
|
2026-02-14 13:25:17 +01:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgShiftR* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
2024-03-06 19:01:52 +01:00
|
|
|
if (optimizeShiftRHS(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgShiftRS* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
2024-03-06 19:01:52 +01:00
|
|
|
if (optimizeShiftRHS(vtxp)) return;
|
2022-10-04 22:18:39 +02:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgSub* const vtxp) override {
|
2022-10-04 22:18:39 +02:00
|
|
|
if (foldBinary(vtxp)) return;
|
|
|
|
|
|
|
|
|
|
DfgVertex* const lhsp = vtxp->lhsp();
|
|
|
|
|
DfgVertex* const rhsp = vtxp->rhsp();
|
|
|
|
|
|
|
|
|
|
if (DfgConst* const rConstp = rhsp->cast<DfgConst>()) {
|
|
|
|
|
if (rConstp->isZero()) {
|
|
|
|
|
APPLYING(REMOVE_SUB_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(lhsp);
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-09-07 21:38:50 +02:00
|
|
|
if (vtxp->dtype() == m_bitDType && rConstp->hasValue(1)) {
|
2022-10-04 22:18:39 +02:00
|
|
|
APPLYING(REPLACE_SUB_WITH_NOT) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(vtxp->fileline(), m_bitDType, lhsp));
|
2022-10-04 22:18:39 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
//=========================================================================
|
|
|
|
|
// DfgVertexTernary
|
|
|
|
|
//=========================================================================
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgCond* const vtxp) override {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
DfgVertex* const condp = vtxp->condp();
|
|
|
|
|
DfgVertex* const thenp = vtxp->thenp();
|
|
|
|
|
DfgVertex* const elsep = vtxp->elsep();
|
2022-10-04 22:18:39 +02:00
|
|
|
FileLine* const flp = vtxp->fileline();
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-07 21:38:50 +02:00
|
|
|
if (condp->dtype() != m_bitDType) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isOnes(condp)) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REMOVE_COND_WITH_TRUE_CONDITION) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(thenp);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(condp)) {
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REMOVE_COND_WITH_FALSE_CONDITION) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(elsep);
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-07 17:25:29 +02:00
|
|
|
if (isSame(thenp, elsep)) {
|
|
|
|
|
APPLYING(REMOVE_COND_WITH_BRANCHES_SAME) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(elsep);
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
if (DfgNot* const condNotp = condp->cast<DfgNot>()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!condp->hasMultipleSinks() || condNotp->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(SWAP_COND_WITH_NOT_CONDITION) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgCond>(vtxp, condNotp->srcp(), elsep, thenp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgNeq* const condNeqp = condp->cast<DfgNeq>()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
if (!condp->hasMultipleSinks()) {
|
|
|
|
|
APPLYING(SWAP_COND_WITH_NEQ_CONDITION) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgEq* const newCondp = make<DfgEq>(condp, condNeqp->lhsp(), condNeqp->rhsp());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgCond>(vtxp, newCondp, elsep, thenp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (DfgNot* const thenNotp = thenp->cast<DfgNot>()) {
|
|
|
|
|
if (DfgNot* const elseNotp = elsep->cast<DfgNot>()) {
|
2022-11-04 16:50:20 +01:00
|
|
|
if (!thenNotp->srcp()->is<DfgConst>() && !elseNotp->srcp()->is<DfgConst>()
|
2022-11-22 14:48:49 +01:00
|
|
|
&& !thenNotp->hasMultipleSinks() && !elseNotp->hasMultipleSinks()) {
|
2022-10-04 19:26:53 +02:00
|
|
|
APPLYING(PULL_NOTS_THROUGH_COND) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgCond* const newCondp = make<DfgCond>(
|
|
|
|
|
vtxp, vtxp->condp(), thenNotp->srcp(), elseNotp->srcp());
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgNot>(thenp->fileline(), vtxp->dtype(), newCondp));
|
2022-10-04 19:26:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-07-07 17:25:29 +02:00
|
|
|
if (DfgOr* const condOrp = condp->cast<DfgOr>()) {
|
|
|
|
|
if (DfgCond* const thenCondp = thenp->cast<DfgCond>()) {
|
|
|
|
|
if (!thenCondp->hasMultipleSinks()) {
|
|
|
|
|
if (condOrp->lhsp() == thenCondp->condp()) {
|
|
|
|
|
// '(a | b) ? (a ? x : y) : z' -> 'a ? x : b ? y : z'
|
|
|
|
|
APPLYING(REPLACE_COND_OR_THEN_COND_LHS) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgCond* const ep = make<DfgCond>(thenCondp, condOrp->rhsp(),
|
|
|
|
|
thenCondp->elsep(), elsep);
|
|
|
|
|
replace(make<DfgCond>(vtxp, condOrp->lhsp(), thenCondp->thenp(), ep));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (condOrp->rhsp() == thenCondp->condp()) {
|
|
|
|
|
// '(a | b) ? (a ? x : y) : z' -> 'a ? x : b ? y : z'
|
|
|
|
|
APPLYING(REPLACE_COND_OR_THEN_COND_RHS) {
|
2026-03-23 10:09:47 +01:00
|
|
|
DfgCond* const ep = make<DfgCond>(thenCondp, condOrp->lhsp(),
|
|
|
|
|
thenCondp->elsep(), elsep);
|
|
|
|
|
replace(make<DfgCond>(vtxp, condOrp->rhsp(), thenCondp->thenp(), ep));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-09-30 17:19:53 +02:00
|
|
|
if (vtxp->width() > 1) {
|
|
|
|
|
// 'cond ? a + 1 : a' -> 'a + cond'
|
|
|
|
|
if (DfgAdd* const thenAddp = thenp->cast<DfgAdd>()) {
|
|
|
|
|
if (DfgConst* const constp = thenAddp->lhsp()->cast<DfgConst>()) {
|
2022-10-28 18:00:32 +02:00
|
|
|
if (constp->hasValue(1)) {
|
2022-09-30 17:19:53 +02:00
|
|
|
if (thenAddp->rhsp() == elsep) {
|
|
|
|
|
APPLYING(REPLACE_COND_INC) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgConcat* const extp = make<DfgConcat>(
|
|
|
|
|
vtxp, makeZero(flp, vtxp->width() - 1), condp);
|
2022-09-30 17:19:53 +02:00
|
|
|
FileLine* const thenFlp = thenAddp->fileline();
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(
|
|
|
|
|
make<DfgAdd>(thenFlp, vtxp->dtype(), thenAddp->rhsp(), extp));
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// 'cond ? a - 1 : a' -> 'a - cond'
|
|
|
|
|
if (DfgSub* const thenSubp = thenp->cast<DfgSub>()) {
|
|
|
|
|
if (DfgConst* const constp = thenSubp->rhsp()->cast<DfgConst>()) {
|
2022-10-28 18:00:32 +02:00
|
|
|
if (constp->hasValue(1)) {
|
2022-09-30 17:19:53 +02:00
|
|
|
if (thenSubp->lhsp() == elsep) {
|
|
|
|
|
APPLYING(REPLACE_COND_DEC) {
|
2024-03-06 19:01:52 +01:00
|
|
|
DfgConcat* const extp = make<DfgConcat>(
|
|
|
|
|
vtxp, makeZero(flp, vtxp->width() - 1), condp);
|
2022-09-30 17:19:53 +02:00
|
|
|
FileLine* const thenFlp = thenSubp->fileline();
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(
|
|
|
|
|
make<DfgSub>(thenFlp, vtxp->dtype(), thenSubp->lhsp(), extp));
|
2022-09-30 17:19:53 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2025-09-07 21:38:50 +02:00
|
|
|
if (vtxp->dtype() == m_bitDType) {
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(thenp)) { // a ? 0 : b becomes ~a & b
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_COND_WITH_THEN_BRANCH_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgAnd>(vtxp, make<DfgNot>(vtxp, condp), elsep));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-07-07 17:25:29 +02:00
|
|
|
if (thenp == condp) { // a ? a : b becomes a | b
|
|
|
|
|
APPLYING(REPLACE_COND_WITH_THEN_BRANCH_COND) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgOr>(vtxp, condp, elsep));
|
2025-07-07 17:25:29 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isOnes(thenp)) { // a ? 1 : b becomes a | b
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_COND_WITH_THEN_BRANCH_ONES) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgOr>(vtxp, condp, elsep));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isZero(elsep)) { // a ? b : 0 becomes a & b
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_COND_WITH_ELSE_BRANCH_ZERO) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgAnd>(vtxp, condp, thenp));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
2025-09-02 17:50:40 +02:00
|
|
|
if (isOnes(elsep)) { // a ? b : 1 becomes ~a | b
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
APPLYING(REPLACE_COND_WITH_ELSE_BRANCH_ONES) {
|
2026-03-23 10:09:47 +01:00
|
|
|
replace(make<DfgOr>(vtxp, make<DfgNot>(vtxp, condp), thenp));
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-23 10:09:47 +01:00
|
|
|
void visit(DfgVertexVar* const vtxp) override {
|
2026-02-14 13:33:20 +01:00
|
|
|
if (vtxp->hasSinks()) return;
|
|
|
|
|
if (vtxp->isObserved()) return;
|
|
|
|
|
if (vtxp->defaultp()) return;
|
|
|
|
|
|
|
|
|
|
// If undriven, or driven from another var, it is completely redundant.
|
|
|
|
|
if (!vtxp->srcp() || vtxp->srcp()->is<DfgVertexVar>()) {
|
|
|
|
|
APPLYING(REMOVE_VAR) {
|
|
|
|
|
deleteVertex(vtxp);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Otherwise remove if there is only one sink that is not a removable variable
|
|
|
|
|
bool foundOne = false;
|
|
|
|
|
const bool keep = vtxp->srcp()->foreachSink([&](DfgVertex& sink) {
|
2026-03-12 00:53:23 +01:00
|
|
|
// Ignore non-observable variable sinks. These can be eliminated.
|
2026-02-14 13:33:20 +01:00
|
|
|
if (const DfgVertexVar* const varp = sink.cast<DfgVertexVar>()) {
|
|
|
|
|
if (!varp->hasSinks() && !varp->isObserved()) return false;
|
|
|
|
|
}
|
|
|
|
|
if (foundOne) return true;
|
|
|
|
|
foundOne = true;
|
|
|
|
|
return false;
|
|
|
|
|
});
|
|
|
|
|
if (!keep) {
|
|
|
|
|
APPLYING(REMOVE_VAR) {
|
|
|
|
|
deleteVertex(vtxp);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
V3DfgPeephole(DfgGraph& dfg, V3DfgPeepholeContext& ctx)
|
|
|
|
|
: m_dfg{dfg}
|
2022-10-06 13:02:46 +02:00
|
|
|
, m_ctx{ctx} {
|
|
|
|
|
|
2026-03-23 15:41:36 +01:00
|
|
|
// Initialize the work list
|
|
|
|
|
m_wlist.reserve(m_dfg.size() * 4);
|
2026-03-23 16:06:07 +01:00
|
|
|
|
2026-03-23 15:41:36 +01:00
|
|
|
// Need a nullptr at index 0 so VertexInfo::m_workListIndex can be used to check membership
|
|
|
|
|
m_wlist.push_back(nullptr);
|
|
|
|
|
|
2026-02-14 13:33:20 +01:00
|
|
|
// Add all variable vertices to the work list. Do this first so they are processed last.
|
|
|
|
|
// This order has a better chance of preserving original variables in case they are needed.
|
|
|
|
|
for (DfgVertexVar& vtx : m_dfg.varVertices()) addToWorkList(&vtx);
|
|
|
|
|
|
2026-03-23 15:41:36 +01:00
|
|
|
// Add all operation vertices to the work list
|
2026-03-22 11:19:31 +01:00
|
|
|
for (DfgVertex& vtx : m_dfg.opVertices()) addToWorkList(&vtx);
|
2022-11-04 16:50:20 +01:00
|
|
|
|
|
|
|
|
// Process the work list
|
2026-03-23 15:41:36 +01:00
|
|
|
while (!m_wlist.empty()) {
|
2026-03-23 10:09:47 +01:00
|
|
|
VL_RESTORER(m_vtxp);
|
|
|
|
|
|
2026-03-23 15:41:36 +01:00
|
|
|
// Pop up head of work list
|
|
|
|
|
m_vtxp = m_wlist.back();
|
|
|
|
|
m_wlist.pop_back();
|
|
|
|
|
if (!m_vtxp) continue; // If removed from worklist, move on
|
|
|
|
|
m_vInfo[m_vtxp].m_workListIndex = 0; // No longer on work list
|
|
|
|
|
|
|
|
|
|
// Variables are special, just visit them, the visit might delete them
|
|
|
|
|
if (DfgVertexVar* const varp = m_vtxp->cast<DfgVertexVar>()) {
|
|
|
|
|
visit(varp);
|
|
|
|
|
continue;
|
2026-03-23 10:09:47 +01:00
|
|
|
}
|
|
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
// Unsued vertices should have been removed immediately
|
|
|
|
|
UASSERT_OBJ(m_vtxp->hasSinks(), m_vtxp, "Operation vertex should have sinks");
|
2026-03-23 10:09:47 +01:00
|
|
|
|
|
|
|
|
// Check if an equivalent vertex exists, if so replace this vertex with it
|
2026-03-23 15:41:36 +01:00
|
|
|
if (DfgVertex* const sampep = m_cache.cache(m_vtxp)) {
|
2026-03-23 16:06:07 +01:00
|
|
|
APPLYING(REPLACE_WITH_EQUIVALENT) {
|
|
|
|
|
replace(sampep);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2026-03-23 10:09:47 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Visit vertex, might get deleted in the process
|
2026-03-23 15:41:36 +01:00
|
|
|
iterate(m_vtxp);
|
|
|
|
|
}
|
2022-10-06 13:02:46 +02:00
|
|
|
}
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
|
2026-03-23 16:06:07 +01:00
|
|
|
#undef APPLYING
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
public:
|
2022-10-06 13:02:46 +02:00
|
|
|
static void apply(DfgGraph& dfg, V3DfgPeepholeContext& ctx) { V3DfgPeephole{dfg, ctx}; }
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
};
|
|
|
|
|
|
2026-03-21 11:13:27 +01:00
|
|
|
V3DebugBisect V3DfgPeephole::s_debugBisect{"DfgPeephole"};
|
|
|
|
|
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
void V3DfgPasses::peephole(DfgGraph& dfg, V3DfgPeepholeContext& ctx) {
|
2025-09-10 13:38:49 +02:00
|
|
|
if (!v3Global.opt.fDfgPeephole()) return;
|
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 17:46:22 +02:00
|
|
|
V3DfgPeephole::apply(dfg, ctx);
|
|
|
|
|
}
|