Commit Graph

204 Commits

Author SHA1 Message Date
Kamil Rakoczy d6126c4b32
Remove --no-threads; require --threads 1 for single threaded (#3703). 2022-11-05 08:47:34 -04:00
Kamil Rakoczy 54e3f15dce
Internals: Add attribute when using clang to VL_MT_SAFE and VL_MT_UNSAFE (#3685) 2022-10-18 05:15:33 -04:00
Wilson Snyder 4367e03e46 Internals: Make VL_UNREACHABLE similar to std::unreachable() 2022-10-02 16:35:45 -04:00
Wilson Snyder c9634695a7 Fix std::exchange for C++11 compilers 2022-10-02 16:25:11 -04:00
Mariusz Glebocki fc3ce29845
Improve Verilation memory by reducing V3Number size (#3521) 2022-09-20 16:46:47 -04:00
Geza Lore 38a8d7fb2e Remove redundant 'inline' keywords from definitions
Also add checks to t/t_dist_cppstyle
2022-09-16 15:52:25 +01:00
Geza Lore 9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Wilson Snyder 1e2219347e Internals: Cleanup ifdef, move up not under compilver version ifdef 2022-08-11 17:41:43 -04:00
Geza Lore 96a4b3e5a5 Update clang-format config and apply
- Regroup and sort #include directives (like we used to, but automatic)
- Set AlwaysBreakTemplateDeclarations to true
2022-08-05 12:00:24 +01:00
Geza Lore 38e5b6c1ad Replace __gcov_flush with __gcov_dump
__gcov_flush was a private function and was removed from later GCC
versions (at least from 11.2.0, possibly earlier). Replace with the
documented public __gcov_dump.
2022-07-30 16:02:03 +01:00
Wilson Snyder b9d7819faa Internals: Fix some cppcheck issues. Some dump functions fixed. 2022-07-30 10:01:39 -04:00
Geza Lore 0de1bbc85b Add and use VL_CONSTEXPR_CXX17 2022-07-05 14:21:28 +01:00
Geza Lore b51f887567
Perform VCD tracing in parallel when using --threads (#3449)
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.

This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.

This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
2022-05-29 19:08:39 +01:00
HungMingWu 880a9be3b1
Internal: Add C++20ish reverse_view for range loops. No functional change (#3388).
Signed-off-by: HungMingWu <u9089000@gmail.com>
2022-04-18 13:03:56 -04:00
Wilson Snyder e02f97854c Deprecate 'vluint64_t' and similar types (#3255). 2022-03-27 15:27:40 -04:00
Wilson Snyder 3f7bf3d2dc Fix MSVC localtime_s (#3124). 2022-03-27 13:59:18 -04:00
Geza Lore b1b5b5dfe2 Improve run-time profiling
The --prof-threads option has been split into two independent options:
1. --prof-exec, for collecting verilator_gantt and other execution
related profiling data, and
2. --prof-pgo, for collecting data needed for PGO

The implementation of execution profiling is extricated from
VlThreadPool and is now a separate class VlExecutionProfiler. This means
--prof-exec can now be used for single-threaded models (though it does
not measure a lot of things just yet). For consistency VerilatedProfiler
is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are
in verilated_profiler.{h/cpp}, but can be used completely independently.

Also re-worked the execution profile format so it now only emits events
without holding onto any temporaries. This is in preparation for some
future optimizations that would be hindered by the introduction of function
locals via AstText.

Also removed the Barrier event. Clearing the profile buffers is not
notably more expensive as the profiling records are trivially
destructible.
2022-03-27 15:57:30 +02:00
Xi Zhang 14d24213a8
Support LoongArch ISA multithreading (#3353) (#3354) 2022-03-17 09:04:47 -04:00
Wilson Snyder 321880f5a6 Add trace dumpvars() call for selective runtime tracing (#3322). 2022-03-05 15:44:32 -05:00
Wilson Snyder 50094ca296 Internals: Add cpplint control file and related cleanups 2022-01-09 16:49:38 -05:00
Wilson Snyder 4cd56b1fb9 Use C++11 standard types for MacOS portability (#3254) (#3257). 2022-01-01 16:04:20 -05:00
Wilson Snyder ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Wilson Snyder 560b59f97f Use C++11 standard types for MacOS portability (#3254). 2021-12-21 13:18:05 -05:00
Wilson Snyder 293a5f402b Fix timescale portability on Arm64 (#3222). 2021-11-28 15:47:19 -05:00
Wilson Snyder 55da66164b Fix verilator_gantt time on Arm. 2021-10-04 22:13:34 -04:00
Wilson Snyder 959793cde3 Internals: Cleanup VL_VALUE_STRING_MAX widths (#3050). 2021-08-23 21:13:33 -04:00
Wilson Snyder 3718fe1ca1 Commentary (trigger rebuild) 2021-05-13 18:34:20 -04:00
Yutetsu TAKATSUKASA 9797af0ad4
Introduce a macro VL_ATTR_NO_SANITIZE_ALIGN to suppress unaligned access check in ubsan (#2929)
* Add VL_ATTR_NO_SANITIZE_ALIGN macro to disable alignment check of ubsan

* Mark a function VL_ATTR_NO_SANITIZE_ALIGN because the function is intentionally using unaligned access for the sake of performance.

Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
2021-05-08 07:16:40 +09:00
HyungKi Jeong 0d6099b2b7
Fix MinGW not supportting 'localtime_r'. (#2882) 2021-04-09 10:40:41 -04:00
Wilson Snyder 8992e2ec02 Commentary 2021-03-28 11:50:05 -04:00
Wilson Snyder e9b5721fb0 Internals: Remove VL_FUNC as __func__ part of C++11 2021-03-28 11:14:51 -04:00
Wilson Snyder ca01d6f18d Internals: Add some std::'s. No functional change intended. 2021-03-26 21:23:18 -04:00
Wilson Snyder 2e158d88c1 Commentary. Remove dox comments from private members, 2021-03-20 21:11:53 -04:00
Wilson Snyder a1ab295b74 Commentary: Cleanup all include/* header comments. 2021-03-20 17:46:00 -04:00
Wilson Snyder 8c3ad591ae Internals: Add additional mutex exclusion checks. No functional change. 2021-03-06 18:29:11 -05:00
Wilson Snyder 47dcbd4b8a Internal: Remove deprecated/insecure functions. No functional change intended. 2021-03-06 10:34:03 -05:00
Wilson Snyder 018d994781 Convert VPI to singleton, part of (#2660). 2021-03-04 19:23:40 -05:00
Wilson Snyder be31fdcfe4 Use Google-style-guide header guard naming, to avoid __ prefix. 2021-03-03 21:57:07 -05:00
Wilson Snyder 8c2ee6c5ab With -DVL_NO_LEGACY hide all outdated API routines 2021-02-22 22:59:23 -05:00
Wilson Snyder bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder 941e5c659a Fix cppcheck parse error 2020-12-23 15:22:02 -05:00
Wilson Snyder b6ded59c2b Internals: Use and enforce class final for ~5% performance boost. 2020-11-18 21:32:16 -05:00
Wilson Snyder 698e0fbbd1 configure: Try compiler flags to get to C++11 (#2502) 2020-08-17 07:40:07 -04:00
Wilson Snyder ee9d6dd63f C++11: Favor auto, range for. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder 7c54a451a9 C++11: Remove pre-c11 VL_OVERRIDE etc. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder f3b28c5c74 Remove configure --enable-prec11-final 2020-08-15 09:39:59 -04:00
Wilson Snyder b1495f0742 Commentary (#2423) 2020-06-14 10:44:57 -04:00
Wilson Snyder c5d61da5d2 Internal coverage: Fix coverage of tests that abort. No functional change intended. 2020-06-05 08:00:22 -04:00
Wilson Snyder b60a92eed3 Fix pre-C11 compiler warning. 2020-05-30 21:11:37 -04:00
Wilson Snyder 4cfa3f879a Internals: Allow VL_DANGLING on pointer const. 2020-05-29 18:31:53 -04:00
Wilson Snyder 5089ac6119 Remove VL_ULL as ULL now in MSVC & C++11 2020-05-28 20:32:07 -04:00
Wilson Snyder 9fd4541069 Fix reduction OR on wide data, broke in v4.026, #2300. 2020-04-30 17:53:54 -04:00
Geza Lore aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Wilson Snyder df52e481fb Collected minor output code cleanups. 2020-04-23 21:22:47 -04:00
Wilson Snyder 5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Wilson Snyder 763f621d4c Deprecate VL_ULL. 2020-04-05 16:45:53 -04:00
Wilson Snyder 5302a9d0e6 Internals: clang-format cleanups. No functional change. 2020-04-04 17:55:37 -04:00
Wilson Snyder 38a31ae168 Cleanup misc clang-tidy warnings. No functional change intended 2020-04-03 22:31:54 -04:00
Wilson Snyder 4361c4b57a Add vlsint8_t types. 2020-03-31 21:30:18 -04:00
Wilson Snyder 1ce360ed5b Add SPDX license identifiers. No functional change. 2020-03-21 11:24:24 -04:00
Wilson Snyder 23eb96579c Fix duplicate __STDC_FORMAT_MACROS definition. 2020-02-24 18:11:41 -05:00
Wilson Snyder a8ad97eef2 Add VL_CACHE_LINE_BYTES and use 64 as defult. 2020-02-01 20:28:03 -05:00
Wilson Snyder 68fa82fb14 Support $typename, and use to cleanup error messages. 2020-01-26 13:21:25 -05:00
Yutetsu TAKATSUKASA fbdf5f2dad Internals: Mark all visit() with VL_OVERRIDE. Closes #2132.
* Add VL_OVERRIDE macro so that compiler can tell my typo when trying to override a function.

* Mark visit() with VL_OVERRIDE. No functional change intended.
2020-01-21 17:35:56 -05:00
Matthew Ballance d87648c258 Correct FST-support issue with some GCC versions. Closes #2129. 2020-01-20 17:33:40 -05:00
Wilson Snyder 09199f79a6 Internals: Add VL_DO_CLEAR delete protections. No functional change intended. 2020-01-18 10:29:49 -05:00
Wilson Snyder 623c4ec103 Internals: Create VL_DO_DANGLING. No functional change intended. 2020-01-16 20:17:11 -05:00
Wilson Snyder f23fe8fd84 Update copyright year. 2020-01-06 18:05:53 -05:00
Stefan Wallentowitz 3ac6745658 Add vpiTimeUnit and allow to specify time as string, bug1636.
Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>
2019-12-13 19:11:37 -05:00
Yutetsu TAKATSUKASA c2037ddbc5 Support string compare, icompare, ato* methods, bug1606.
Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>
2019-12-09 19:17:52 -05:00
Wilson Snyder 700f2072c0 Framework for WDatas being vectors of 64-bit EDatas, but not supporting this at this time. 2019-12-08 21:36:38 -05:00
Wilson Snyder f87107e757 Tests etc: Cleanup some clang-format suggestions. No functional change. 2019-11-09 20:35:12 -05:00
Wilson Snyder 5811ec07e6 Update URLs to https://verilator.org 2019-11-07 22:33:59 -05:00
Wilson Snyder 9f977ed419 Codacy/Cppcheck cleanups and badge. 2019-10-24 21:48:45 -04:00
Patrick Stewart 8e6d68147c Support multithreading on Windows.
Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>
2019-10-07 19:23:06 -04:00
Wilson Snyder 771a301f66 Commentary: Remove newlines, upsets some patches. No functional change. 2019-10-04 20:17:11 -04:00
Wilson Snyder 6b798830c9 Fix spacing 2019-07-29 21:07:37 -04:00
Wilson Snyder 01ef7122e9 Internals: Add lcov code coverage markers. 2019-06-30 22:37:03 -04:00
Wilson Snyder 2cedd14d43 Fix build error on MinGW, bug1460. 2019-06-11 21:38:17 -04:00
Wilson Snyder afea6d84e3 Mark infrequently called functions with GCC cold attribute. 2019-05-14 22:03:50 -04:00
Wilson Snyder b23fc06388 Internals: Detab and fix spacing style issues in some include files. No functional change. 2019-05-07 23:30:24 -04:00
Sergey Kvachonok d8a020905a Fix MinGW GCC 6 printf formats, bug1413.
Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>
2019-04-02 18:24:36 -04:00
Wilson Snyder f700a73b3e Fix cygwin warning on fstapi, msg2770. 2019-01-20 14:16:09 -05:00
Wilson Snyder 8a4aeddbb0 Copyright year update. 2019-01-03 19:17:22 -05:00
Wilson Snyder 8170391573 Internals: Fix spacing of comments. No functional change. 2018-11-28 19:59:10 -05:00
Wilson Snyder 304a24d03a Internals: Fix many clang-tidy issues. No functional change intended. 2018-10-14 18:39:33 -04:00
Wilson Snyder d87b9d25ca Internals: Cleanup and standardize include order. No functional change intended. 2018-10-14 13:59:40 -04:00
Wilson Snyder 4684f1d4cc Include sys/types to make sure __WORDSIZE is set, bug1333. 2018-08-31 06:52:12 -04:00
Wilson Snyder 0ef3baa87d Merge from master 2018-07-22 11:53:58 -04:00
Wilson Snyder 3a777c2a37 Verilated: Always define VL_RDTSC, though unused; useful for benchmarking. 2018-07-22 11:53:35 -04:00
Wilson Snyder ec82243ee6 Fix shrinking new unordered_set_map to save memory. No end-result change intended. 2018-06-15 20:57:07 -04:00
Wilson Snyder 1c5c9e2435 cppcheck fixes 2018-06-12 21:14:20 -04:00
John Coiner c124fe6d0c Add clones of std::unordered_map and std::unordered_set for pre-C++11 compilers. 2018-06-11 22:05:15 -04:00
Wilson Snyder 92649ba494 includes: Fix VL_RDTSC & misc stuff. 2018-05-24 22:17:44 -04:00
Wilson Snyder 0ef3c10931 Pull some thread include changes from thread branch. 2018-05-08 21:43:32 -04:00
Wilson Snyder 8e65d93d6d Copyright year update. No functional change. 2018-01-02 18:05:06 -05:00
Wilson Snyder add5cc8b56 Internals: Add VL_UNCOPYABLE to make classes uncopyable. No functional change intended. 2017-11-01 18:51:41 -04:00
Wilson Snyder f91bac7b31 Rewrite include libraries to support VL_THREADED towards future threading 2017-10-26 21:51:51 -04:00
Wilson Snyder ad693d67b2 Mark Verilated functions with multithreaded status. No functional change. 2017-10-26 20:05:42 -04:00