Commit Graph

100 Commits

Author SHA1 Message Date
Wilson Snyder fd12ab3413 Fix interface exposure with `--public-depth` or `--trace-depth` (#5758). 2025-09-23 22:05:51 -04:00
Geza Lore 327d55d13d
Internals: Fix remaining cppcheck errors (#6319)
Fixed the non const-related issue and added suppressions for the const
ones. With that `make cppcheck` should be clean.
2025-08-21 09:43:37 +01:00
Geza Lore d1f71f2342
Internals: Improve V3Rtti for cppcheck (#6312)
Rewrite with much less running around in the templates. Use private
methods only + friend functions that do the actual type check. This
avoids cppcheck warnings.
2025-08-19 23:05:34 +01:00
Wilson Snyder 680236b03e Internals: Redo post-error additional information to be part of error calls. 2025-05-10 16:20:12 -04:00
Wilson Snyder 8fbb725f34 Copyright year update. 2025-01-01 08:30:25 -05:00
Wilson Snyder 0c820c3068 Internals: Standardize template argument names. No functional change. 2024-11-29 20:20:38 -05:00
Bartłomiej Chmiel ffe76717c6
Thread pool rewrite (#5161)
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
Signed-off-by: Bartłomiej Chmiel <bchmiel@antmicro.com>
Signed-off-by: Arkadiusz Kozdra <akozdra@antmicro.com>
Co-authored-by: Krzysztof Bieganski <kbieganski@antmicro.com>
Co-authored-by: Arkadiusz Kozdra <akozdra@antmicro.com>
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
2024-08-23 08:36:49 -04:00
Geza Lore 98206a4f04
Improve V3List user interface (#4996) 2024-03-25 23:06:25 +00:00
Geza Lore 6ffff8565f
Use the same serial ordering within MTasks as we use in serial mode (#4994)
The goal here is to use as single ordering heuristic (which can be
improved later) within MTasks as we do for serial code ordering. The
heuristic itself is factored out into the new OrderMoveGraphSerializer.
This also yields slightly nicer ordering than the previously use
GraphStream, so we end up with fewer trigger (domain) conditionals in
the MTasks, this can be worth a few percent speedup.

This has the somewhat nice side-effect of reusing OrderMoveGraphVertex
for both serial and parallel mode, so MTaskMoveGraphVertex can be
removed.

Serial mode yields identical output.
2024-03-17 13:15:39 +00:00
Geza Lore c0391990ad Increase graph ParallelismReprot values to uint64_t 2024-03-10 18:56:31 +00:00
Geza Lore a686e547cf
Factor out graph parallelism report into a generic algorithm (#4957)
This is a generic algorithm parametrised by a cost function, so
implement it as such for easy reuse.
2024-03-10 14:56:43 +00:00
Wilson Snyder e76f29e5ba Copyright year update 2024-01-01 03:19:59 -05:00
Wilson Snyder 9fd5634778 Internals: Remove unneeded private's. No functional change 2023-11-13 21:37:45 -05:00
Wilson Snyder c8063e5732 Internals: Misc cleanups in V3Graph and V3Dead. No functional change. 2023-11-12 22:08:08 -05:00
Wilson Snyder 7ba6647c4f Internals: Cleanup some V3Graph constructors/funcs and docs. No functional change. 2023-10-28 20:11:28 -04:00
Wilson Snyder bcbe5059a9 Internal: V3Graph style cleanup. No functional change 2023-10-22 09:50:38 -04:00
Mariusz Glebocki 28bd7e5b19
Rework multithreading handling to separate by code units that use/never use it. (#4228) 2023-09-24 22:12:23 -04:00
Wilson Snyder d72f1b89fc Internals: Minor internal code coverage cleanups 2023-09-10 18:53:51 -04:00
Krzysztof Bieganski ffbbd438ae
Internals: Use runtime type info instead of `dynamic_cast` for faster graph type checks (#4397) 2023-08-31 18:00:53 -04:00
Anthony Donlon cf6566b9bc
Internal: Optimize program size by refactoring error reporting routines (#4446) 2023-08-29 16:54:32 -04:00
Kamil Rakoczy 93d50c4499
Internals: Add mutex to V3Error (#3680) 2023-02-09 22:15:37 -05:00
Wilson Snyder b24d7c83d3 Copyright year update 2023-01-01 10:18:39 -05:00
Wilson Snyder a0e7930036 docs: Fix spelling 2022-12-09 22:39:41 -05:00
Wilson Snyder 833780fac1 Internal: cppcheck fixes. No functional change intended. 2022-11-27 05:52:40 -05:00
Wilson Snyder 0c75d4eaca Internals: Fix constructor style. 2022-11-10 22:58:27 -05:00
Geza Lore 050060b139 Make enum constructors and operators constexpr 2022-09-23 11:10:28 +01:00
Geza Lore 63c694f65f Streamline dump control options
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
  special case of `--dumpi-<tag>` where tag can be a magic word, or a
  filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
  works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
  special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
2022-09-22 17:24:41 +01:00
Geza Lore 38a8d7fb2e Remove redundant 'inline' keywords from definitions
Also add checks to t/t_dist_cppstyle
2022-09-16 15:52:25 +01:00
Geza Lore 9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Geza Lore 4d81eb021d Revert "Improve performance of MTask coarsening"
This reverts commit 83475008d9.
2022-08-19 18:03:45 +01:00
Geza Lore 83475008d9 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-19 16:59:20 +01:00
Geza Lore cd50949a7e Reuse MTaskEdge instances in MT scheduling
Instead of deleting then re-allocating MTaskEdge instances when merging
two MTasks, just redirect the edged of the donor MTask to the recipient
MTask. This is both faster as it avoids an allocation and a deletion,
together with one update of the sibling maps, and also makes the
algorithm more stable due to MergeCandidate IDs being stable and
allocated up front for all MTaskEdges, before any SiblingMCs are
allocated.

Perturbations in output are expected as the IDs used to break ties
between merge candidates with equal costs are not updated when
redirecting an edge (on purpose). The relinking of only one end of the
graph edges also perturbs the order in which they are enumerated, which
does change candidate opportunities when the number of edges is larger
than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when
IDs are updated and edges are updated to appear in their original order.
2022-08-19 14:06:11 +01:00
Geza Lore b436794773 Add specialized GraphStreamUnordered
GraphStreamUnordered used to be GraphStream<std::less<const
V3GraphVertex*>>, but a lot of performance improvements can be had by a
specialized implementation, so added a highly optimized one. This helps
a lot with --debug-partition.
2022-08-19 14:06:11 +01:00
Geza Lore a2792785fe Add V3GraphVertex::dotRank to add GraphViz ranks to graph dumps
This is a simple debugging aid to allow constraining the graph layout
via GraphViz rank directives. Note this is not related in any way to the
vertex 'rank' attribute used by some of the graph algorithms.

No functional change.
2022-05-02 10:27:26 +01:00
Geza Lore 2ba9eb4228 Speed up TSP sort implementation
- More efficient comparison by pre-computing sorting keys.
- Remove work items in algorithms known to be redundant earlier.
  This greatly reduces data structure sizes.
- Use V3GraphVertex->user() for state tracking instead of unordered_map
  while both of these are constant time, they do add up.
- In `makeMinSpanningTree`, instead of batch inserting outgoing edges of
  each visited vertex into an ordered set, keep an ordered set of sorted
  vectors of edges. This reduces the size of the ordered set
  significantly (it is now O(V) rather than O(E), and as the subject
  graph is a complete graph, V ~ sqrt(E), so this is a significant gain).
- Use a vector + sorting in `perfectMatching` instead of an ordered set.
  This is faster on large working sets.

This yields 3.8x speedup on the variable order pass and overall 14%
verilation speed gain on a large design.
2022-01-07 12:05:52 +00:00
Wilson Snyder ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Geza Lore 987ce927eb Remove unused code. No functional change. 2021-11-09 19:46:19 +00:00
Wilson Snyder 3a55600913 Internals: Restyle with C++11 using replacing typedef 2021-03-12 18:10:45 -05:00
Wilson Snyder be31fdcfe4 Use Google-style-guide header guard naming, to avoid __ prefix. 2021-03-03 21:57:07 -05:00
Wilson Snyder bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder c23de458ed Misc internal coverage cleanups 2020-12-08 08:40:22 -05:00
Wilson Snyder b6ded59c2b Internals: Use and enforce class final for ~5% performance boost. 2020-11-18 21:32:16 -05:00
Wilson Snyder 1b0a48ea02 Internals: Use C++11 = default where obvious. No functional change intended. 2020-11-16 19:56:16 -05:00
Wilson Snyder b67f1f0e94 Fix GCC warnings 2020-08-18 08:10:44 -04:00
Wilson Snyder 78aee6f4e7 C++11: Use sized enums (+4% performance). 2020-08-16 12:05:35 -04:00
Wilson Snyder 034737d2a8 C++11: Use member declaration initalizations (in nodes). No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder 5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Wilson Snyder 1ce360ed5b Add SPDX license identifiers. No functional change. 2020-03-21 11:24:24 -04:00
Wilson Snyder 0aabe6ce00 Internals: Fix cppcheck warning including missing init. 2020-02-03 22:10:29 -05:00