Commit Graph

24 Commits

Author SHA1 Message Date
Wilson Snyder 3a5248a919 Internals: Mark structs final/VL_NOT_FINAL. No functional change intended. 2024-01-20 15:06:46 -05:00
Wilson Snyder e76f29e5ba Copyright year update 2024-01-01 03:19:59 -05:00
Wilson Snyder b5828a7ce9 Fix header order botched by clang-format in recent commit. 2023-10-18 06:37:46 -04:00
github action 770cd24f27 Apply 'make format' 2023-10-18 02:50:27 +00:00
Wilson Snyder 431bb1ed16
Support compiling Verilator with gcc/clang precompiled headers (#4579) 2023-10-17 22:49:28 -04:00
Mariusz Glebocki 28bd7e5b19
Rework multithreading handling to separate by code units that use/never use it. (#4228) 2023-09-24 22:12:23 -04:00
Wilson Snyder b24d7c83d3 Copyright year update 2023-01-01 10:18:39 -05:00
Wilson Snyder 21d80cdfa1 Internals: Fix constructor style. 2022-11-19 14:45:38 -05:00
Geza Lore 9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Geza Lore 4d81eb021d Revert "Improve performance of MTask coarsening"
This reverts commit 83475008d9.
2022-08-19 18:03:45 +01:00
Geza Lore 83475008d9 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-19 16:59:20 +01:00
Wilson Snyder ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Wilson Snyder bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder b6ded59c2b Internals: Use and enforce class final for ~5% performance boost. 2020-11-18 21:32:16 -05:00
Wilson Snyder 1b0a48ea02 Internals: Use C++11 = default where obvious. No functional change intended. 2020-11-16 19:56:16 -05:00
Wilson Snyder 72d2cff0a1 C++11: Use member declaration initalizations. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder 5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Wilson Snyder 1ce360ed5b Add SPDX license identifiers. No functional change. 2020-03-21 11:24:24 -04:00
Wilson Snyder f23fe8fd84 Update copyright year. 2020-01-06 18:05:53 -05:00
Wilson Snyder 5811ec07e6 Update URLs to https://verilator.org 2019-11-07 22:33:59 -05:00
Wilson Snyder 771a301f66 Commentary: Remove newlines, upsets some patches. No functional change. 2019-10-04 20:17:11 -04:00
Wilson Snyder 8a4aeddbb0 Copyright year update. 2019-01-03 19:17:22 -05:00
Wilson Snyder e37dce9d85 Internals: Add new graph algs for future partitioning. 2018-07-15 22:09:27 -04:00