Commit Graph

539 Commits

Author SHA1 Message Date
Wilson Snyder 5089ac6119 Remove VL_ULL as ULL now in MSVC & C++11 2020-05-28 20:32:07 -04:00
Wilson Snyder 279f21bb5b Configure now enables SystemC if it is installed as a system headers. 2020-05-28 18:51:46 -04:00
Wilson Snyder 773ed97504 Internals Most VerilatedLockGuard can be const. No functional change intended. 2020-05-28 18:23:46 -04:00
Stephen Henry c4aab57c62
Internals: Refactor to introduce VerilatedFdList. (#2363) 2020-05-28 17:59:18 -04:00
Maarten De Braekeleer e8f27be200 Fix Visual Studio compiler issues (#2375)
* Make sure compiler creates same object file as target of rule
* MSVC requires a string return
* Case ranges are a gnu extension which MSVC does not understand
* _dupenv_s also returns 0 if the var could not be found
See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/dupenv-s-wdupenv-s?view=vs-2019
2020-05-28 17:39:20 -04:00
Geza Lore 622f59ad65
Set OPT_FAST=-Os as default (#2374) 2020-05-28 00:57:49 +01:00
Geza Lore d737266f64
Add OPT_GLOBAL to use for run-time library (#2373)
This allows compiling the run-time library with optimization even when OPT_FAST is not used in order to imporove model build speed, possibly during debug cycles.
2020-05-27 01:52:08 +01:00
Geza Lore 9d7086067c Rework serial/parallel build mode
Instead of __ALLfast.cpp and __ALLslow.cpp, we now create only a single
__ALL.cpp and compile it with OPT_FAST, this speeds up small builds
where the C compiler does not dominate. A separate patch will follow
turning VM_PARALLEL_BUILDS on by default at a certain size.

Given this change to the build there is now no point in emitting both
fast and slow routines into the same .cpp file when --output-split is
not set as they will be just included in the same __ALL.cpp file. To
keep things simpler and the output easier to comprehend, V3EmitC has
also been changed to always emit the fast and slow files separately.

Also change verilated.mk to apply OPT_SLOW to all slow files, not just
ones called *__Slow.cpp. This change in particular ensures __Syms.cpp
is build as slow.

Part of #2360.
2020-05-26 01:22:10 +01:00
Jan Van Winkel 424769c32b
Fix warning for unused param in VL_RTOIROUND_Q_D (#2356) 2020-05-25 08:13:12 -04:00
Wilson Snyder 50662751fe Fix compiler unsigned warnings 2020-05-23 22:38:17 -04:00
Wilson Snyder 6a882f9dc6 Internal code coverage improvements. No functional change intended. 2020-05-23 10:34:58 -04:00
Ludwig Rogiers 101314a572
Add VPI error reset to vpi_get_time() (#2347) 2020-05-22 07:09:47 -04:00
Wilson Snyder c64c81b7a3 Workaround missing guard (partial #2333). 2020-05-19 08:31:52 -04:00
Geza Lore 53cde90c8f De-constify fields to appease old GCC with broken STL
No functional change intended, fixes #2342
2020-05-18 19:01:08 +01:00
Stephen Henry ba3930777a
Support display/scan %u/%z (#2324) (#2332) 2020-05-18 08:10:32 -04:00
Stephen Henry cef0105dfc
Fix requiring C++11 algorithms. (#2339) (#2340) 2020-05-17 11:44:42 -04:00
Wilson Snyder ed4c7038b4 Tracing: Remove dead code. No functional change intended. 2020-05-17 09:53:58 -04:00
Wilson Snyder c4f31d3bb6 Tracing: Remove dead code. No functional change intended. 2020-05-17 09:52:03 -04:00
Wilson Snyder 17e7da77f0 Misc internal coverage improvements. 2020-05-16 18:02:54 -04:00
Wilson Snyder 6fd7f45cef Internals: Remove dead needHInlines code 2020-05-16 07:53:27 -04:00
Wilson Snyder 57a937df03 Misc internal coverage cleanups 2020-05-16 07:43:22 -04:00
Wilson Snyder 29bcbb0417 Suppress impossible code coverage issues 2020-05-15 22:34:29 -04:00
Wilson Snyder 35a53d9adb Add t_trace_c_api test. 2020-05-15 20:38:08 -04:00
Geza Lore 900c023bb5 Refactor trace implementation to allow experimentation
The main goal of this patch is to enable splitting the full and
incremental tracing functions into multiple functions, which can then be
run in parallel at a later stage. It also simplifies further
experimentation as all of the interesting trace code construction now
happens in V3Trace. No functional change is intended by this patch, but
there are some implementation changes in the generated code.

Highlights:
- Pass symbol table directly to trace callbacks for simplicity.
- A new traceRegister function is generated which adds each trace
function as an individual callback, which means we can have multiple
callbacks for each trace function type.
- A new traceCleanup function is generated which clears the activity
flags, as the trace callbacks might be implemented as multiple functions.
- Re-worked sub-function handling so there is no separate sub-function
for each trace activity class. Sub-functions are generate when required
by splitting.
- traceFull/traceChg are now created in V3Trace rather than V3TraceDecl,
this requires carrying the trace value tree in TraceDecl until it
reaches V3Trace where the TraceInc nodes are created (previously a
TraceInc was also created in V3TraceDecl which carries the value).
2020-05-15 18:34:29 +01:00
Wilson Snyder c1a9fe07e9 Support multi channel descriptor I/O (#2190)
clang-format and Changes update. No functional change.
2020-05-14 18:14:50 -04:00
Stephen Henry 1a0da2e4ec
Support multi-channel descriptor (MCD) I/O (#2197) 2020-05-14 18:03:00 -04:00
Wilson Snyder f005b7fd87 Support scan %* format 2020-05-11 22:13:59 -04:00
Wilson Snyder 29695adf70 Fix 10s/100s timeunits. 2020-05-11 08:15:52 -04:00
Yossi Nivin f9a0cf0cff
Support $countbits (#2287) 2020-05-10 14:27:22 -04:00
Geza Lore ac09ad3ffd
Minor improvements to DPI open array handling (#2316)
- Allow arbitrary number of open array dimensions, not just 3. Note
right now this only works with the array querying functions specified
in IEEE 1800-2017 H.12.2
- Issue error when passing dynamic array or queue as DPI open array
(currently unsupported)
- Also tweaked AstVar::vlArgTypeRecurse, which should now error or fail
for unsupported types.
2020-05-08 18:22:44 +01:00
Dan Petrisko ee1b20e1cd
Adding missing sstream include (#2312) 2020-05-06 19:16:41 -04:00
Geza Lore 8afcd67a1f Fix FST tracing of little endian vectors 2020-05-03 22:39:45 +01:00
John Demme 6e9008fb5a
Fix VerilatedVarProps::totalSize missing the first unpacked dim (#2296) 2020-05-01 07:42:29 -04:00
Wilson Snyder 5ded80cf79 Fix MacOs Homebrew by removing default LIBS, #2298. 2020-04-30 19:53:21 -04:00
Wilson Snyder 9fd4541069 Fix reduction OR on wide data, broke in v4.026, #2300. 2020-04-30 17:53:54 -04:00
Peter Horvath dc64b43152
Fix xcode clang bug workaround (#2295) 2020-04-30 07:20:31 -04:00
Geza Lore 209a585a68 Remove VL_NEGATE_{I,Q,E}, use C native unary '-' instead
This is to avoid slowing down -O0 models unnecessarily.
2020-04-30 01:05:52 +01:00
Geza Lore aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Geza Lore dd967f7769 Improve trace buffer memory utilization and performance.
Convert trace buffer to 32-bit entries, rather than a union containing a
pointer type. Also tweaked trace entry layouts for a bit more
performance. This gains another 10% on SweRV EH1 CoreMark.
2020-04-27 19:00:17 +01:00
Geza Lore b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Geza Lore 9991b19610 Another attempt at flushing threaded VCD correctly. 2020-04-25 18:40:09 +01:00
Geza Lore c1665818b9
Fix missing flush with threaded VCD tracing. (#2282)
VerilatedVcdC::openNext() failed to flush the tracing thread before
opening the next output file, which caused t_trace_cat.pl to fail
with --vltmt on occasion.
2020-04-24 03:09:26 +01:00
Wilson Snyder df52e481fb Collected minor output code cleanups. 2020-04-23 21:22:47 -04:00
Geza Lore c96a43b452
Fix unused variable in VL_READMEM_N (#2274) 2020-04-22 17:25:35 -04:00
Geza Lore c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Wilson Snyder 174fd1bf0e Codacy cleanups. No functional change. 2020-04-20 22:01:47 -04:00
Geza Lore 39d903375b
Factor out trace implementation common to all formats. (#2268)
This patch de-duplicates common functionality between the VCD and FST
trace implementation. It also enables adding new trace formats more
easily and consistently.

No functional nor performance change intended.
2020-04-19 23:57:36 +01:00
Geza Lore 6a54922044
Set FST timescale correctly. (#2266)
The FST trace timescale used to be set in the constructor via
set_time_unit, but at that point we haven't normally opened the
file yet so it was just dropped. On top of that, we actually want
to use set_time_resolution... FST trace timescales now match the VCD.
2020-04-19 08:47:22 -04:00
Geza Lore 74e16d85c5
Fix FST trace initial time stamp. (#2264)
If the first dump was not at time zero, then the FST trace used
to contain the initial values as if they were set at time zero. Now
they only appear at the time the first dump call is actually made,
and hence match the VCD trace exactly.
2020-04-18 18:54:02 -04:00
Wilson Snyder e6f345e45d Internal: clang-tidy fixes. No functional change. 2020-04-15 21:47:37 -04:00