Commit Graph

2415 Commits

Author SHA1 Message Date
Wilson Snyder 01f3e81a36 Internals: Parse extend/implements/etc using generic identifiers. 2020-05-21 21:31:15 -04:00
Wilson Snyder 3cb3b6c400 clang-format verilog.l. No functional change. 2020-05-21 19:46:21 -04:00
Wilson Snyder 8f58b58853 Internals: Parse using idAny where can to better detect id errors. 2020-05-20 23:29:37 -04:00
Wilson Snyder b66826169d Internals: Refactor inherits/extends parse so don't need type vs nontype IDs. 2020-05-20 23:10:45 -04:00
Wilson Snyder d2a7376f14 Fix spacing. No functional change 2020-05-20 22:33:32 -04:00
Wilson Snyder c9e8a1cb4d Commentary 2020-05-20 20:31:57 -04:00
Geza Lore d20a4db773
Fix regression due to early constant folding in +: and -: (#2338) 2020-05-18 18:46:00 +01:00
Stephen Henry ba3930777a
Support display/scan %u/%z (#2324) (#2332) 2020-05-18 08:10:32 -04:00
Geza Lore 9c054a5774 Optimize trace activity flags a bit more
- Improve flag pruning heuristic
- Set all trace activity flags in slow code. This in turns enables us
to remove checking the slow flag on the fast path.
2020-05-17 19:41:24 +01:00
Wilson Snyder 4773a1e77c Misc internal coverage improvements. 2020-05-17 11:06:14 -04:00
Geza Lore dc25e9b949
Optimize fine grained trace activity flags (#2336)
Firstly, we always use a byte array for fine grained activity flags
instead of a bit vector (we used to use a byte array only if we had
parallel mtasks). The byte vector can be set more cheaply in eval,
closing about 1/3 of the gap in performance between compiling with
or without --trace on SweRV EH1. The speed of tracing itself is not
measurably different.

Secondly, we prune the activity tracking such that if a set of activity
flag combinations only guard a small number of signals, we will turn
those signals into awayls traced signals. This avoids code which
sometimes tests dozens of activity flags just to subsequently check one
signal and dump it if it's value changed. We can just check the signal
state straight instead, and not bother with the flags. This removes
about 30% of activity flags in SweRV EH1, and makes both single threaded
VCD and FST tracing 8-9% faster.
2020-05-17 13:53:02 +01:00
Wilson Snyder 17e7da77f0 Misc internal coverage improvements. 2020-05-16 18:02:54 -04:00
Wilson Snyder 9c0c6439cc Clean additional objects. 2020-05-16 13:28:03 -04:00
Wilson Snyder d33d0301f8 Support verilator_coverage --write-info for lcov HTML reports. 2020-05-16 09:18:35 -04:00
Wilson Snyder 6fd7f45cef Internals: Remove dead needHInlines code 2020-05-16 07:53:27 -04:00
Wilson Snyder 57a937df03 Misc internal coverage cleanups 2020-05-16 07:43:22 -04:00
Wilson Snyder 35a53d9adb Add t_trace_c_api test. 2020-05-15 20:38:08 -04:00
Wilson Snyder 2885c2ce97 Fix coredump on countbits. 2020-05-15 19:29:17 -04:00
Geza Lore 900c023bb5 Refactor trace implementation to allow experimentation
The main goal of this patch is to enable splitting the full and
incremental tracing functions into multiple functions, which can then be
run in parallel at a later stage. It also simplifies further
experimentation as all of the interesting trace code construction now
happens in V3Trace. No functional change is intended by this patch, but
there are some implementation changes in the generated code.

Highlights:
- Pass symbol table directly to trace callbacks for simplicity.
- A new traceRegister function is generated which adds each trace
function as an individual callback, which means we can have multiple
callbacks for each trace function type.
- A new traceCleanup function is generated which clears the activity
flags, as the trace callbacks might be implemented as multiple functions.
- Re-worked sub-function handling so there is no separate sub-function
for each trace activity class. Sub-functions are generate when required
by splitting.
- traceFull/traceChg are now created in V3Trace rather than V3TraceDecl,
this requires carrying the trace value tree in TraceDecl until it
reaches V3Trace where the TraceInc nodes are created (previously a
TraceInc was also created in V3TraceDecl which carries the value).
2020-05-15 18:34:29 +01:00
Geza Lore 12b95f6b93 Clean up V3TraceDecl & V3Trace. No functional change intended.
- Constify variables
- Remove redundancies
- [Hopefully] make some code a bit more readable
2020-05-15 18:34:29 +01:00
Stephen Henry 1a0da2e4ec
Support multi-channel descriptor (MCD) I/O (#2197) 2020-05-14 18:03:00 -04:00
Huang Rui 68d7596adf
Fix compile error when using bison 3.6.1 (#2320)
Workaround issue: bison 3.6.1 generated unexpected nested-comment
Closes: https://github.com/verilator/verilator/issues/2320
Signed-off-by: Huang Rui <vowstar@gmail.com>
2020-05-13 19:18:56 -04:00
Wilson Snyder f005b7fd87 Support scan %* format 2020-05-11 22:13:59 -04:00
Wilson Snyder 61e41595a2 Fix clang warning 2020-05-11 20:31:37 -04:00
Stephen Henry 484b574cef
Fix crash on self-referential enum type. (#2319) 2020-05-11 18:44:28 -04:00
Wilson Snyder 29695adf70 Fix 10s/100s timeunits. 2020-05-11 08:15:52 -04:00
Wilson Snyder 15f63d12d5 Fix message for seeded random. 2020-05-10 21:15:48 -04:00
Wilson Snyder ba7b3fd60f Support $display(,,). 2020-05-10 20:48:18 -04:00
Wilson Snyder 897b9ccfe2 Fix display of huge double. 2020-05-10 16:03:46 -04:00
Wilson Snyder b97e1aa9fe Support cast to string 2020-05-10 15:42:16 -04:00
Wilson Snyder d4a631446b Fix crash in unroller on increment-only while loops. 2020-05-10 15:26:41 -04:00
Wilson Snyder 8998ffc4e5 Support reporting some fork syntax errors. 2020-05-10 15:01:43 -04:00
Wilson Snyder ca162716c6 Internals: clang-format 2020-05-10 14:29:15 -04:00
Wilson Snyder 12b903caf4 Improve error on using array.unique method. 2020-05-10 14:28:42 -04:00
Yossi Nivin f9a0cf0cff
Support $countbits (#2287) 2020-05-10 14:27:22 -04:00
Wilson Snyder 070bcddf5a Support unpacked array .sum and .product. 2020-05-10 12:48:33 -04:00
Wilson Snyder feb1e2bd48 Commentary 2020-05-10 11:01:57 -04:00
Wilson Snyder 6e7ee23644 Internals: Code cleanups. 2020-05-09 15:00:46 -04:00
Wilson Snyder c00cc18d37 Optimize dead code after gotos 2020-05-09 15:00:36 -04:00
Wilson Snyder a7e17a8855 Fix double conversion on half of conditional. 2020-05-08 21:35:45 -04:00
Geza Lore ac09ad3ffd
Minor improvements to DPI open array handling (#2316)
- Allow arbitrary number of open array dimensions, not just 3. Note
right now this only works with the array querying functions specified
in IEEE 1800-2017 H.12.2
- Issue error when passing dynamic array or queue as DPI open array
(currently unsupported)
- Also tweaked AstVar::vlArgTypeRecurse, which should now error or fail
for unsupported types.
2020-05-08 18:22:44 +01:00
Wilson Snyder 9375d9f603 Fix $isunknown with constant Zs. 2020-05-07 21:40:08 -04:00
Wilson Snyder 72bd91c7f1 Support $isunbounded and parameter $. (#2104) 2020-05-07 21:12:58 -04:00
Wilson Snyder 8850ca962e Fix newish error to use standard parens to ref IEEE. 2020-05-07 21:12:58 -04:00
Wilson Snyder 5f7ae1fbce wip 2020-05-07 21:04:26 -04:00
Wilson Snyder b56a25e89c Fix newish error to use standard parens to ref IEEE. 2020-05-07 18:21:11 -04:00
Geza Lore 6a54fb6f96 Modify std::multimap in V3Combine safely.
We used to iterate the m_callMmap std::multimap by getting limit
iterators from equal_range, but we also modify the same map in the loop
which invalidates those limit iterators. Note this only caused actual
problems if the new AstCCall inserted via 'addCall' in the loop had a
memory address (which is used as the key) which fell into the range
returned by equal_range, so was pretty hard to trigger.
2020-05-07 14:31:43 +01:00
Wilson Snyder 546ccd56c4 Internals: Enable future JumpGo to non-end. No functional change intended. 2020-05-06 21:33:05 -04:00
Wilson Snyder ca77a93214 Add lint check for bad delay locations. 2020-05-06 19:25:13 -04:00
Yutetsu TAKATSUKASA aa86e0bbc0
Support 'E', 'p', and 'P' when overriding floating point parameter. (#2310) 2020-05-06 07:45:07 -04:00
Wilson Snyder b6b3482010 Internals: Use typ delay by default 2020-05-05 20:42:19 -04:00
Wilson Snyder 05aecd2c0b Internals: Fix tabs in astgen. No effective functional change. 2020-05-05 20:33:35 -04:00
Wilson Snyder 6ab3d8f3ed Internals: Refactor to add AstNodeProcedure. No functional change intended. 2020-05-05 19:12:36 -04:00
Wilson Snyder 7d7e67b49b Show Verilog reference on V3Number asserts. 2020-05-04 19:57:21 -04:00
Wilson Snyder a41ea180fa Fix +: and -: on unpacked arrays. (#2304) 2020-05-04 19:40:50 -04:00
Wilson Snyder 8f64e4a76f Support $root, #2150. 2020-05-02 08:29:20 -04:00
Wilson Snyder a6deee2083 Fix clock enables with bit-extends, #2299. 2020-04-30 19:22:58 -04:00
Wilson Snyder 9fd4541069 Fix reduction OR on wide data, broke in v4.026, #2300. 2020-04-30 17:53:54 -04:00
Geza Lore 849487da23
Modify --build to be a standalone option (#2294)
- Issue an error when --build is used together with --make
- When given --build, always use GNU Make to perform the build
- Update documentation (examples were good as they were)
- Remove the broken t_flag_build_cmake test

Fixes #2280
2020-04-30 12:54:50 +01:00
Geza Lore 209a585a68 Remove VL_NEGATE_{I,Q,E}, use C native unary '-' instead
This is to avoid slowing down -O0 models unnecessarily.
2020-04-30 01:05:52 +01:00
Geza Lore aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Wilson Snyder b44efe7ef7 Use 'suggest' for consistent wording. 2020-04-28 21:19:19 -04:00
Wilson Snyder 15ad3f46be Fix logical not optimization with empty begin, #2291. 2020-04-28 21:15:20 -04:00
Wilson Snyder c6d1a9858a Use clang-format 10.0.0 2020-04-28 18:47:59 -04:00
Wilson Snyder 910803e6db Fix error on unpacked connecting to packed, #2288. 2020-04-27 18:38:54 -04:00
Geza Lore dd967f7769 Improve trace buffer memory utilization and performance.
Convert trace buffer to 32-bit entries, rather than a union containing a
pointer type. Also tweaked trace entry layouts for a bit more
performance. This gains another 10% on SweRV EH1 CoreMark.
2020-04-27 19:00:17 +01:00
Geza Lore b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Wilson Snyder 70549e1a64 Internals: Parse lifetime directives; still unsupported. 2020-04-26 12:45:06 -04:00
Wilson Snyder 87e1c36e4a Support event data type (with some restrictions). 2020-04-25 15:37:46 -04:00
Wilson Snyder 5e575f5906 Fix line numbers in tables. 2020-04-24 19:34:26 -04:00
Wilson Snyder df52e481fb Collected minor output code cleanups. 2020-04-23 21:22:47 -04:00
Wilson Snyder f93ae707e0 Tests: Add bad option test. 2020-04-23 19:56:26 -04:00
Geza Lore 6ed10b7fde
Fix --protect-lib generated library link rules (#2279)
We used to include a .cpp file on the link line for the shared library,
which was ignored, but generated a .d file for the .so which contained
the header files required by the .cpp file. This then caused a rebuild
where we included the .d in verilated.mk to included in the .h headers
among the prerequisites of the .so, yielding a clang error about treating
.h files as c++-header rather than c-header... Long story short, we don't
do that anymore. This used t cause t_a4_examples to fail on occasion.

Note there is no need for a separate compilation rule for the
<--protect-lib>.cpp, as it will jsut pick up the standard OPT_FAST rule.
2020-04-23 17:30:23 -04:00
Wilson Snyder 7176aee852 Internals: Parse fork and delays, but then still report unsupported. 2020-04-22 21:31:40 -04:00
Wilson Snyder 77915f78db Add experimental-only option. 2020-04-21 20:45:23 -04:00
Geza Lore c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
James Hanlon 97cbc10925 Add --flaten for use with --xml-only (#2270). 2020-04-21 18:14:08 -04:00
James Hanlon 65cd4f6047 Fix comment and add to CONTRIBUTORS (#2270). 2020-04-21 18:11:53 -04:00
Wilson Snyder 15f7685755 Codacity cleanups. No functional change intended. 2020-04-20 21:43:05 -04:00
Wilson Snyder def40fab9b Internals: Rename VSigning 2020-04-19 21:19:09 -04:00
Wilson Snyder 9164eb03d5 Show that class parameters even if unused are unsupported. 2020-04-19 18:36:55 -04:00
Wilson Snyder 466535abdc Support direct class member init. 2020-04-18 20:20:17 -04:00
Wilson Snyder 39d7cbf412 Fix arrayed instances connecting to slices, #2263. 2020-04-17 19:30:53 -04:00
Wilson Snyder e6f345e45d Internal: clang-tidy fixes. No functional change. 2020-04-15 21:47:37 -04:00
Wilson Snyder d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Yutetsu TAKATSUKASA 18412f9322
Add --build option to call make/cmake as subprocess (#2249)
* Add --build, -j, -MAKEFLAGS, and --no-verilate options
* Verilator: Can build on both gmake and cmake
2020-04-15 17:44:21 -04:00
Wilson Snyder 1883ab29cb clang-format 10.0 forward compatibility. No functional change. 2020-04-15 17:36:57 -04:00
Geza Lore 1a64c7d232
Fix run-time formatting of variable wider than 1023 bits (#2261) 2020-04-15 17:26:15 -04:00
Wilson Snyder f3308d236b clang-format remaining sources. No functional change. 2020-04-15 07:58:34 -04:00
Wilson Snyder 1b94e3b0e2 Internals: clang-format files needed for #2249. 2020-04-14 19:55:00 -04:00
Geza Lore 08b74e5ab9
Fix crash when formatting constant wider than 1023 bits (#2260) 2020-04-14 18:07:09 -04:00
Wilson Snyder 5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Geza Lore dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Wilson Snyder dba88bae3c Support class new. 2020-04-12 18:57:12 -04:00
Wilson Snyder d4b6e2b2b5 Internals: NodeModule for packages. 2020-04-12 14:53:10 -04:00
Wilson Snyder 1e2d73fc80 Internals: clang-format and refactor taskref pin handling. 2020-04-12 08:26:14 -04:00
Wilson Snyder ea3acc2d3a Fix --skip-identical broke recent commit. 2020-04-11 20:22:57 -04:00
Geza Lore 8b2666cd04
Fix to make trace code allocation dense. (#2250)
This looks like a bits/bytes bug. The affected m_codeInc member
determines how many 32-bit words to allocate in a buffer used to store
previous values of the signal, but this was off by a factor of 8, so
we used to use too much memory.

SweRV VCD tracing speed +6.5% (excluding IO, clang 6.0), due mainly to
reduced D cache misses.
2020-04-11 16:00:43 +01:00
Wilson Snyder afa8e4c786 Internals: Favor const_iterator. No functional change. 2020-04-11 10:54:42 -04:00
Wilson Snyder 1a6c2fc55d Fix class members getting misoptimized away. 2020-04-10 21:10:21 -04:00