Commit Graph

1615 Commits

Author SHA1 Message Date
Geza Lore fe708f045a Fix Travis oddity 2020-05-04 00:21:07 +01:00
Geza Lore 8afcd67a1f Fix FST tracing of little endian vectors 2020-05-03 22:39:45 +01:00
Wilson Snyder 8f64e4a76f Support $root, #2150. 2020-05-02 08:29:20 -04:00
John Demme 6e9008fb5a
Fix VerilatedVarProps::totalSize missing the first unpacked dim (#2296) 2020-05-01 07:42:29 -04:00
Wilson Snyder a6deee2083 Fix clock enables with bit-extends, #2299. 2020-04-30 19:22:58 -04:00
Wilson Snyder 9fd4541069 Fix reduction OR on wide data, broke in v4.026, #2300. 2020-04-30 17:53:54 -04:00
Geza Lore 849487da23
Modify --build to be a standalone option (#2294)
- Issue an error when --build is used together with --make
- When given --build, always use GNU Make to perform the build
- Update documentation (examples were good as they were)
- Remove the broken t_flag_build_cmake test

Fixes #2280
2020-04-30 12:54:50 +01:00
Geza Lore aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Wilson Snyder b44efe7ef7 Use 'suggest' for consistent wording. 2020-04-28 21:19:19 -04:00
Wilson Snyder 15ad3f46be Fix logical not optimization with empty begin, #2291. 2020-04-28 21:15:20 -04:00
Wilson Snyder 910803e6db Fix error on unpacked connecting to packed, #2288. 2020-04-27 18:38:54 -04:00
Wilson Snyder 87e1c36e4a Support event data type (with some restrictions). 2020-04-25 15:37:46 -04:00
Wilson Snyder 3b37b5b92d Tests: Check output from some unsupported tests. 2020-04-24 08:22:19 -04:00
Geza Lore 10b4678ee6 Make vgen.pl deterministic 2020-04-24 04:53:33 +01:00
Geza Lore 27f4399c31
Fix tests failing on rerun after passing from clean. (#2281) 2020-04-23 21:27:06 -04:00
Wilson Snyder f93ae707e0 Tests: Add bad option test. 2020-04-23 19:56:26 -04:00
Geza Lore 8208fe8a0e
Fix test failures on Ubuntu 20.04 (#2278)
- Packaged SystemC lives in /usr so needed to update regex in test
driver
- Clang 10 complains about mixed named and positional initializers in
struct definitions.
2020-04-23 17:29:37 -04:00
Wilson Snyder ace35b3e81 Tests: Add -G test. 2020-04-23 08:05:14 -04:00
Wilson Snyder 2b58e834ee Tests: Rename IVERILOG define for consistency. No functional change. 2020-04-23 08:05:14 -04:00
Wilson Snyder 7176aee852 Internals: Parse fork and delays, but then still report unsupported. 2020-04-22 21:31:40 -04:00
Wilson Snyder 77915f78db Add experimental-only option. 2020-04-21 20:45:23 -04:00
Geza Lore c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
James Hanlon 97cbc10925 Add --flaten for use with --xml-only (#2270). 2020-04-21 18:14:08 -04:00
Wilson Snyder 174fd1bf0e Codacy cleanups. No functional change. 2020-04-20 22:01:47 -04:00
Wilson Snyder b12413e42f Tests: Reenable some tests incorrectly marked unsupported. 2020-04-20 21:55:23 -04:00
Wilson Snyder 15f7685755 Codacity cleanups. No functional change intended. 2020-04-20 21:43:05 -04:00
Wilson Snyder fceedd9f4d Tests: Update static test. 2020-04-19 21:18:57 -04:00
Wilson Snyder 4272f2116e Tests: Update static test. 2020-04-19 20:10:07 -04:00
Geza Lore 6a54922044
Set FST timescale correctly. (#2266)
The FST trace timescale used to be set in the constructor via
set_time_unit, but at that point we haven't normally opened the
file yet so it was just dropped. On top of that, we actually want
to use set_time_resolution... FST trace timescales now match the VCD.
2020-04-19 08:47:22 -04:00
Wilson Snyder 466535abdc Support direct class member init. 2020-04-18 20:20:17 -04:00
Geza Lore efacac2e3d
Tests: Ignore SystemC file paths in expected test results (#2265) 2020-04-18 18:56:19 -04:00
Geza Lore 74e16d85c5
Fix FST trace initial time stamp. (#2264)
If the first dump was not at time zero, then the FST trace used
to contain the initial values as if they were set at time zero. Now
they only appear at the time the first dump call is actually made,
and hence match the VCD trace exactly.
2020-04-18 18:54:02 -04:00
Wilson Snyder 39d7cbf412 Fix arrayed instances connecting to slices, #2263. 2020-04-17 19:30:53 -04:00
Wilson Snyder 8f7e463656 Tests: Fix makeflag test, was failing older makes. 2020-04-16 17:31:41 -04:00
Wilson Snyder d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Wilson Snyder 58091edd68 Tests: Fix cmake -j unknown 2020-04-15 18:08:31 -04:00
Yutetsu TAKATSUKASA 18412f9322
Add --build option to call make/cmake as subprocess (#2249)
* Add --build, -j, -MAKEFLAGS, and --no-verilate options
* Verilator: Can build on both gmake and cmake
2020-04-15 17:44:21 -04:00
Geza Lore 1a64c7d232
Fix run-time formatting of variable wider than 1023 bits (#2261) 2020-04-15 17:26:15 -04:00
Geza Lore 08b74e5ab9
Fix crash when formatting constant wider than 1023 bits (#2260) 2020-04-14 18:07:09 -04:00
Geza Lore dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Wilson Snyder dba88bae3c Support class new. 2020-04-12 18:57:12 -04:00
Wilson Snyder ea3acc2d3a Fix --skip-identical broke recent commit. 2020-04-11 20:22:57 -04:00
Wilson Snyder 8e6674066f Tests: Clean before rerunning failing test. 2020-04-11 11:40:15 -04:00
Wilson Snyder 15b40a97d9 Support `unconnected_drive 2020-04-09 23:26:03 -04:00
Wilson Snyder 608d5a87d1 tests: Avoid assuming a timescale. 2020-04-07 20:55:47 -04:00
Geza Lore 0cfa828572 Fix DPI import/export to be standard compliant, #2236. 2020-04-07 19:07:47 -04:00
Wilson Snyder b6c21ad21a Fix duplicate traces with $dumpfile, part of #2237. 2020-04-06 08:33:51 -04:00
Wilson Snyder 26301a4133 Commentary 2020-04-06 08:19:32 -04:00
Wilson Snyder 383f9832d4 Tests: Standardize verilog indentation. 2020-04-05 21:53:24 -04:00
Wilson Snyder a331397954 Fix real conversion from constant with X/Z. 2020-04-05 11:56:15 -04:00