Commit Graph

408 Commits

Author SHA1 Message Date
Wilson Snyder ac04e85a1c C++11: More range for. No functional change intended. 2020-08-16 12:54:32 -04:00
Wilson Snyder 78aee6f4e7 C++11: Use sized enums (+4% performance). 2020-08-16 12:05:35 -04:00
Wilson Snyder ee9d6dd63f C++11: Favor auto, range for. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder 72d2cff0a1 C++11: Use member declaration initalizations. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder 033e7ac020 C++11: Use member declaration initalizations. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder 042d3eed23 C++11: Use override where possible. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder 7c54a451a9 C++11: Remove pre-c11 VL_OVERRIDE etc. No functional change. 2020-08-16 11:44:05 -04:00
Edgar E. Iglesias 5d98035170
Fix sc names (#2500)
cint.mainInt(nodep) walks the tree and populates m_ctorVarsVec.
Reuse EmitCImp cint for the slow mainImp() emition steps to make sure
we emit constructor calls to setup SystemC sc_module names.
2020-08-13 08:23:02 -04:00
Wilson Snyder 98cd925fda Fix non-32 bit conversion to float (#2495). 2020-08-06 21:56:43 -04:00
Wilson Snyder 2bc813da4a Refactor and test VL_MULS_MAX_WORDS 2020-08-03 22:12:24 -04:00
Ludwig Rogiers f13fd4478c Init params in constructor to support pre-c++11 compilers 2020-06-22 09:01:39 +01:00
Ludwig Rogiers c367b671b6
Support VPI parameter and localparam (#2370) 2020-06-12 18:38:01 -04:00
Geza Lore fac89c5d62
Close trace on vl_fatal/vl_finish (#2414)
This is required to get the last bit of FST trace and close the FST file
properly on $stop or assertion failure.
2020-06-12 07:15:42 +01:00
Wilson Snyder 6de78d58fa Add new UNSUPPORTED error code to replace most previous Unsupported: messages. 2020-06-09 19:20:16 -04:00
Wilson Snyder a57826b125 Line Coverage now tracks all statement lines, not just branch lines. 2020-05-31 15:52:17 -04:00
Geza Lore 95534fa5c5
Remove unused headers (#2389) 2020-05-31 20:21:07 +01:00
Wilson Snyder 611e8b10c6 Fix Verilog coverage to pass real column information 2020-05-31 09:05:10 -04:00
Geza Lore 9712ceedd7 Internals: Remove empty statements. No functional change intended.
Remove stray semicolons, mostly by capturing them in macros accurately.
This removes a ton on lint warnings from CLion.
2020-05-30 19:13:18 +01:00
Wilson Snyder 5089ac6119 Remove VL_ULL as ULL now in MSVC & C++11 2020-05-28 20:32:07 -04:00
Wilson Snyder 773ed97504 Internals Most VerilatedLockGuard can be const. No functional change intended. 2020-05-28 18:23:46 -04:00
Maarten De Braekeleer e8f27be200 Fix Visual Studio compiler issues (#2375)
* Make sure compiler creates same object file as target of rule
* MSVC requires a string return
* Case ranges are a gnu extension which MSVC does not understand
* _dupenv_s also returns 0 if the var could not be found
See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/dupenv-s-wdupenv-s?view=vs-2019
2020-05-28 17:39:20 -04:00
Wilson Snyder c5da38206e Fix configure over-disabling warnings. 2020-05-27 08:45:11 -04:00
Geza Lore 72858175a2 Only emit VM_PARALLEL_BUILDS=1 iff --output-split caused a split.
Previously we set VM_PARALLEL_BUILDS=1 if the --output-split option was
provided. Now we only do it iff it actually causes a split.
2020-05-26 01:22:10 +01:00
Geza Lore 9d7086067c Rework serial/parallel build mode
Instead of __ALLfast.cpp and __ALLslow.cpp, we now create only a single
__ALL.cpp and compile it with OPT_FAST, this speeds up small builds
where the C compiler does not dominate. A separate patch will follow
turning VM_PARALLEL_BUILDS on by default at a certain size.

Given this change to the build there is now no point in emitting both
fast and slow routines into the same .cpp file when --output-split is
not set as they will be just included in the same __ALL.cpp file. To
keep things simpler and the output easier to comprehend, V3EmitC has
also been changed to always emit the fast and slow files separately.

Also change verilated.mk to apply OPT_SLOW to all slow files, not just
ones called *__Slow.cpp. This change in particular ensures __Syms.cpp
is build as slow.

Part of #2360.
2020-05-26 01:22:10 +01:00
Vassilis Papaefstathiou a7432bdea7 Support wide operands in queues and dynamic arrays (#2352) 2020-05-23 21:59:56 -04:00
Stephen Henry ba3930777a
Support display/scan %u/%z (#2324) (#2332) 2020-05-18 08:10:32 -04:00
Wilson Snyder 4773a1e77c Misc internal coverage improvements. 2020-05-17 11:06:14 -04:00
Wilson Snyder 6fd7f45cef Internals: Remove dead needHInlines code 2020-05-16 07:53:27 -04:00
Geza Lore 900c023bb5 Refactor trace implementation to allow experimentation
The main goal of this patch is to enable splitting the full and
incremental tracing functions into multiple functions, which can then be
run in parallel at a later stage. It also simplifies further
experimentation as all of the interesting trace code construction now
happens in V3Trace. No functional change is intended by this patch, but
there are some implementation changes in the generated code.

Highlights:
- Pass symbol table directly to trace callbacks for simplicity.
- A new traceRegister function is generated which adds each trace
function as an individual callback, which means we can have multiple
callbacks for each trace function type.
- A new traceCleanup function is generated which clears the activity
flags, as the trace callbacks might be implemented as multiple functions.
- Re-worked sub-function handling so there is no separate sub-function
for each trace activity class. Sub-functions are generate when required
by splitting.
- traceFull/traceChg are now created in V3Trace rather than V3TraceDecl,
this requires carrying the trace value tree in TraceDecl until it
reaches V3Trace where the TraceInc nodes are created (previously a
TraceInc was also created in V3TraceDecl which carries the value).
2020-05-15 18:34:29 +01:00
Geza Lore 12b95f6b93 Clean up V3TraceDecl & V3Trace. No functional change intended.
- Constify variables
- Remove redundancies
- [Hopefully] make some code a bit more readable
2020-05-15 18:34:29 +01:00
Stephen Henry 1a0da2e4ec
Support multi-channel descriptor (MCD) I/O (#2197) 2020-05-14 18:03:00 -04:00
Wilson Snyder f005b7fd87 Support scan %* format 2020-05-11 22:13:59 -04:00
Wilson Snyder 61e41595a2 Fix clang warning 2020-05-11 20:31:37 -04:00
Wilson Snyder 29695adf70 Fix 10s/100s timeunits. 2020-05-11 08:15:52 -04:00
Yossi Nivin f9a0cf0cff
Support $countbits (#2287) 2020-05-10 14:27:22 -04:00
Wilson Snyder 546ccd56c4 Internals: Enable future JumpGo to non-end. No functional change intended. 2020-05-06 21:33:05 -04:00
Geza Lore aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Geza Lore dd967f7769 Improve trace buffer memory utilization and performance.
Convert trace buffer to 32-bit entries, rather than a union containing a
pointer type. Also tweaked trace entry layouts for a bit more
performance. This gains another 10% on SweRV EH1 CoreMark.
2020-04-27 19:00:17 +01:00
Geza Lore b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Wilson Snyder df52e481fb Collected minor output code cleanups. 2020-04-23 21:22:47 -04:00
Geza Lore c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Wilson Snyder e6f345e45d Internal: clang-tidy fixes. No functional change. 2020-04-15 21:47:37 -04:00
Wilson Snyder d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Geza Lore 1a64c7d232
Fix run-time formatting of variable wider than 1023 bits (#2261) 2020-04-15 17:26:15 -04:00
Wilson Snyder f3308d236b clang-format remaining sources. No functional change. 2020-04-15 07:58:34 -04:00
Geza Lore dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Wilson Snyder dba88bae3c Support class new. 2020-04-12 18:57:12 -04:00
Wilson Snyder afa8e4c786 Internals: Favor const_iterator. No functional change. 2020-04-11 10:54:42 -04:00
Wilson Snyder 1a6c2fc55d Fix class members getting misoptimized away. 2020-04-10 21:10:21 -04:00