From a9ff0a0f32b8f9caf89af607c02e350e5fbbf8f6 Mon Sep 17 00:00:00 2001 From: Wilson Snyder Date: Fri, 9 Dec 2022 23:16:14 -0500 Subject: [PATCH] docs: Fix grammar --- README.rst | 6 +- docs/guide/changes.rst | 2 +- docs/guide/contributors.rst | 12 +- docs/guide/copyright.rst | 4 +- docs/guide/example_common_install.rst | 2 +- docs/guide/overview.rst | 20 +-- docs/internals.rst | 224 +++++++++++++------------- 7 files changed, 134 insertions(+), 136 deletions(-) diff --git a/README.rst b/README.rst index dbc308a47..02b0593e1 100644 --- a/README.rst +++ b/README.rst @@ -32,7 +32,7 @@ Welcome to Verilator * Single- and multithreaded output models * - **Widely Used** * Wide industry and academic deployment - * Out-of-the-box support from Arm, and RISC-V vendor IP + * Out-of-the-box support from Arm and RISC-V vendor IP - |verilator usage| * - |verilator community| - **Community Driven & Openly Licensed** @@ -62,7 +62,7 @@ performs the design simulation. Verilator also supports linking Verilated generated libraries, optionally encrypted, into other simulators. Verilator may not be the best choice if you are expecting a full-featured -replacement for a closed-source Verilog simulator, need SDF annotation, +replacement for a closed-source Verilog simulator, needs SDF annotation, mixed-signal simulation, or are doing a quick class project (we recommend `Icarus Verilog`_ for classwork.) However, if you are looking for a path to migrate SystemVerilog to C++/SystemC, or want high-speed simulation of @@ -101,7 +101,7 @@ For more information: - `Verilator manual (HTML) `_, or `Verilator manual (PDF) `_ -- `Subscribe to verilator announcements +- `Subscribe to Verilator announcements `_ - `Verilator forum `_ diff --git a/docs/guide/changes.rst b/docs/guide/changes.rst index 3688dc00e..f8baf257a 100644 --- a/docs/guide/changes.rst +++ b/docs/guide/changes.rst @@ -10,7 +10,7 @@ Revision History "Revision History" in the sidebar. Changes are contained in the :file:`Changes` file of the distribution, and -also summarized below. To subscribe to new versions see `Verilator +also summarized below. To subscribe to new versions, see `Verilator Announcements `_. .. include:: ../_build/gen/Changes diff --git a/docs/guide/contributors.rst b/docs/guide/contributors.rst index 290619929..3d82e0b62 100644 --- a/docs/guide/contributors.rst +++ b/docs/guide/contributors.rst @@ -137,11 +137,11 @@ Historical Origins Verilator was conceived in 1994 by Paul Wasson at the Core Logic Group at Digital Equipment Corporation. The Verilog code that was converted to C -was then merged with a C based CPU model of the Alpha processor and -simulated in a C based environment called CCLI. +was then merged with a C-based CPU model of the Alpha processor and +simulated in a C-based environment called CCLI. -In 1995 Verilator started being used also for Multimedia and Network -Processor development inside Digital. Duane Galbi took over active +In 1995 Verilator started being also used for Multimedia and Network +Processor development inside Digital. Duane Galbi took over the active development of Verilator, and added several performance enhancements. CCLI was still being used as the shell. @@ -149,7 +149,7 @@ In 1998, through the efforts of existing DECies, mainly Duane Galbi, Digital graciously agreed to release the source code. (Subject to the code not being resold, which is compatible with the GNU Public License.) -In 2001, Wilson Snyder took the kit, and added a SystemC mode, and called +In 2001, Wilson Snyder took the kit, added a SystemC mode, and called it Verilator2. This was the first packaged public release. In 2002, Wilson Snyder created Verilator 3.000 by rewriting Verilator from @@ -168,5 +168,5 @@ fork/join, delay handling, DFG performance optimizations, and other improvements. Currently, various language features and performance enhancements are added -as the need arises, with a focus towards getting to full Universal +as the need arises, with a focus on getting to complete Universal Verification Methodology (UVM, IEEE 1800.2-2017) support. diff --git a/docs/guide/copyright.rst b/docs/guide/copyright.rst index 50bccc55f..245141c1d 100644 --- a/docs/guide/copyright.rst +++ b/docs/guide/copyright.rst @@ -13,7 +13,7 @@ can redistribute it and/or modify the Verilator internals under the terms of either the GNU Lesser General Public License Version 3 or the Perl Artistic License Version 2.0. -All Verilog and C++/SystemC code quoted within this documentation file are -released as Creative Commons Public Domain (CC0). Many example files and +All Verilog and C++/SystemC code quoted within this documentation file is +released as Creative Commons Public Domain (CC0). Many example files and test files are likewise released under CC0 into effectively the Public Domain as described in the files themselves. diff --git a/docs/guide/example_common_install.rst b/docs/guide/example_common_install.rst index 07697c4f3..c834a7cda 100644 --- a/docs/guide/example_common_install.rst +++ b/docs/guide/example_common_install.rst @@ -1,7 +1,7 @@ .. Copyright 2003-2022 by Wilson Snyder. .. SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0 -First you need Verilator installed, see :ref:`Installation`. In brief, if +First you need Verilator installed, see :ref:`Installation`. In brief, if you installed Verilator using the package manager of your operating system, or did a :command:`make install` to place Verilator into your default path, you do not need anything special in your environment, and should not have diff --git a/docs/guide/overview.rst b/docs/guide/overview.rst index d69100814..3661883fb 100644 --- a/docs/guide/overview.rst +++ b/docs/guide/overview.rst @@ -8,8 +8,8 @@ Overview Welcome to Verilator! The Verilator package converts Verilog [#]_ and SystemVerilog [#]_ hardware -description language (HDL) designs into a C++ or SystemC model that after -compiling can be executed. Verilator is not a traditional simulator, but a +description language (HDL) designs into a C++ or SystemC model that, after +compiling, can be executed. Verilator is not a traditional simulator, but a compiler. Verilator is typically used as follows: @@ -18,13 +18,13 @@ Verilator is typically used as follows: to GCC, or other simulators such as Cadence Verilog-XL/NC-Verilog, or Synopsys VCS. Verilator reads the specified SystemVerilog code, lints it, optionally adds coverage and waveform tracing support, and compiles the -design into a source level multithreaded C++ or SystemC "model". The +design into a source-level multithreaded C++ or SystemC "model". The resulting model's C++ or SystemC code is output as .cpp and .h files. This -is referred to as "Verilating" and the process is "to Verilate"; the output -is a "Verilated" model. +is referred to as "Verilating", and the process is "to Verilate"; the +output is a "Verilated" model. -2. For simulation, a small user written C++ wrapper file is required, the -"wrapper". This wrapper defines the C++ standard function "main()" which +2. For simulation, a small-user written C++ wrapper file is required, the +"wrapper". This wrapper defines the C++ standard function "main()", which instantiates the Verilated model as a C++/SystemC object. 3. The user C++ wrapper, the files created by Verilator, a "runtime @@ -44,12 +44,12 @@ The best place to get started is to try the :ref:`Examples`. .. [#] Verilog is defined by the `Institute of Electrical and Electronics Engineers (IEEE) Standard for Verilog Hardware Description Language`, Std. 1364, released in 1995, 2001, and 2005. The - Verilator documentation uses the shorthand e.g. "IEEE 1394-2005" to - refer to the e.g. 2005 version of this standard. + Verilator documentation uses the shorthand, e.g., "IEEE 1394-2005", + to refer to the e.g. 2005 version of this standard. .. [#] SystemVerilog is defined by the `Institute of Electrical and Electronics Engineers (IEEE) Standard for SystemVerilog - Unified Hardware Design, Specification, and Verification Language`, Standard 1800, released in 2005, 2009, 2012, and 2017. The Verilator - documentation uses the shorthand e.g. "IEEE 1800-2017" to refer to + documentation uses the shorthand e.g., "IEEE 1800-2017", to refer to the e.g. 2017 version of this standard. diff --git a/docs/internals.rst b/docs/internals.rst index f4aa386a6..47801dac1 100644 --- a/docs/internals.rst +++ b/docs/internals.rst @@ -39,12 +39,12 @@ The main flow of Verilator can be followed by reading the Verilator.cpp 3. Cells in the AST first linked, which will read and parse additional files as above. -4. Functions, variable and other references are linked to their +4. Functions, variable, and other references are linked to their definitions. 5. Parameters are resolved, and the design is elaborated. -6. Verilator then performs many additional edits and optimizations on +6. Verilator then performs additional edits and optimizations on the hierarchical design. This includes coverage, assertions, X elimination, inlining, constant propagation, and dead code elimination. @@ -56,15 +56,15 @@ The main flow of Verilator can be followed by reading the Verilator.cpp a single scope and single VarScope for each variable. A module that occurs twice will have a scope for each occurrence, and two VarScopes for each variable. This allows optimizations to proceed - across the flattened design, while still preserving the hierarchy. + across the flattened design while still preserving the hierarchy. 8. Additional edits and optimizations proceed on the pseudo-flat design. These include module references, function inlining, loop unrolling, variable lifetime analysis, lookup table creation, always - splitting, and logic gate simplifications (pushing inverters, etc). + splitting, and logic gate simplifications (pushing inverters, etc.). 9. Verilator orders the code. Best case, this results in a single - "eval" function which has all always statements flowing from top to + "eval" function, which has all always statements flowing from top to bottom with no loops. 10. Verilator mostly removes the flattening, so that code may be shared @@ -95,14 +95,14 @@ this. Each ``AstNode`` has pointers to up to four children, accessed by the ``op1p`` through ``op4p`` methods. These methods are then abstracted in a -specific Ast\* node class to a more specific name. For example with the +specific Ast\* node class to a more specific name. For example, with the ``AstIf`` node (for ``if`` statements), ``thensp`` calls ``op2p`` to give the pointer to the AST for the "then" block, while ``elsesp`` calls ``op3p`` to give the pointer to the AST for the "else" block, or NULL if there is not one. These accessors are automatically generated by ``astgen`` after parsing the ``@astgen`` directives in the specific ``AstNode`` subclasses. -``AstNode`` has the concept of a next and previous AST - for example the +``AstNode`` has the concept of a next and previous AST - for example, the next and previous statements in a block. Pointers to the AST for these statements (if they exist) can be obtained using the ``back`` and ``next`` methods. @@ -136,7 +136,7 @@ the pass. A number of passes use graph algorithms, and the class ``V3Graph`` is provided to represent those graphs. Graphs are directed, and algorithms are -provided to manipulate the graphs and to output them in `GraphViz +provided to manipulate the graphs and output them in `GraphViz `__ dot format. ``V3Graph.h`` provides documentation of this class. @@ -150,7 +150,7 @@ algorithms for ordering the graph. A generic ``user``/``userp`` member variable is also provided. Virtual methods are provided to specify the name, color, shape, and style -to be used in dot output. Typically, users provide derived classes from +to be used in dot output. Typically users provide derived classes from ``V3GraphVertex`` which will reimplement these methods. Iterators are provided to access in and out edges. Typically these are used @@ -173,9 +173,9 @@ vertices. Edges have an associated ``weight`` and may also be made Accessors, ``fromp`` and ``top`` return the "from" and "to" vertices respectively. -Virtual methods are provided to specify the label, color and style to be +Virtual methods are provided to specify the label, color, and style to be used in dot output. Typically users provided derived classes from -``V3GraphEdge`` which will reimplement these methods. +``V3GraphEdge``, which will reimplement these methods. ``V3GraphAlg`` @@ -183,7 +183,7 @@ used in dot output. Typically users provided derived classes from This is the base class for graph algorithms. It implements a ``bool`` method, ``followEdge`` which algorithms can use to decide whether an edge -is followed. This method returns true if the graph edge has weight greater +is followed. This method returns true if the graph edge has a weight greater than one and a user function, ``edgeFuncp`` (supplied in the constructor) returns ``true``. @@ -194,11 +194,11 @@ provided and documented in ``V3GraphAlg.cpp``. ``DfgGraph`` ^^^^^^^^^^^^^ -The data-flow graph based combinational logic optimizer (DFG optimizer) +The data-flow graph-based combinational logic optimizer (DFG optimizer) converts an ``AstModule`` into a ``DfgGraph``. The graph represents the combinational equations (~continuous assignments) in the module, and for the duration of the DFG passes, it takes over the role of the represented -``AstModule``. The ``DfgGraph`` keeps holds of the represented ``AstModule``, +``AstModule``. The ``DfgGraph`` keeps hold of the represented ``AstModule``, and the ``AstModule`` retains all other logic that is not representable as a data-flow graph. At the end of optimization, the combinational logic represented by the ``DfgGraph`` is converted back into AST form and is @@ -212,7 +212,7 @@ writing DFG passes easier. The ``DfgGraph`` represents combinational logic equations as a graph of ``DfgVertex`` vertices. Each sub-class of ``DfgVertex`` corresponds to an -expression (a sub-class of ``AstNodeExpr``), a constanat, or a variable +expression (a sub-class of ``AstNodeExpr``), a constant, or a variable reference. LValues and RValues referencing the same storage location are represented by the same ``DfgVertex``. Consumers of such vertices read as the LValue, writers of such vertices write the RValue. The bulk of the final @@ -225,11 +225,11 @@ Scheduling Verilator implements the Active and NBA regions of the SystemVerilog scheduling model as described in IEEE 1800-2017 chapter 4, and in particular sections -4.5 and Figure 4.1. The static (verilation time) scheduling of SystemVerilog +4.5 and Figure 4.1. The static (Verilation time) scheduling of SystemVerilog processes is performed by code in the ``V3Sched`` namespace. The single -entry-point to the scheduling algorithm is ``V3Sched::schedule``. Some +entry point to the scheduling algorithm is ``V3Sched::schedule``. Some preparatory transformations important for scheduling are also performed in -``V3Active`` and ``V3ActiveTop``. High level evaluation functions are +``V3Active`` and ``V3ActiveTop``. High-level evaluation functions are constructed by ``V3Order``, which ``V3Sched`` invokes on subsets of the logic in the design. @@ -267,8 +267,8 @@ The classes of logic we distinguish between are: below. - Clocked logic. Any process or construct that has an explicit sensitivity - list, with no implicit sensitivities is considered 'clocked' (or - 'sequential') logic. This includes among other things ``always`` and + list, with no implicit sensitivities, is considered 'clocked' (or + 'sequential') logic. This includes, among other things ``always`` and ``always_ff`` processes with an explicit sensitivity list. Note that the distinction between clocked logic and combinational logic is only @@ -321,7 +321,7 @@ At the highest level, ordering is performed by ``V3Order::order``, which is invoked by ``V3Sched::schedule`` on various subsets of the combinational and clocked logic as described below. The important thing to highlight now is that ``V3Order::order`` operates by assuming that the state of all variables driven -by combinational logic are consistent with that combinational logic. While this +by combinational logic is consistent with that combinational logic. While this might seem subtle, it is very important, so here is an example: :: @@ -335,7 +335,7 @@ first, and all downstream combinational logic (like the assignment to ``d``) will execute after the clocked logic that drives inputs to the combinational logic, in data-flow (or dependency) order. At the end of the evaluation step, this ordering restores the invariant that variables driven by combinational -logic are consistent with that combinational logic (i.e.: the circuit is in a +logic are consistent with that combinational logic (i.e., the circuit is in a settled/steady state). One of the most important optimizations for performance is to only evaluate @@ -344,12 +344,12 @@ point in evaluating the above assignment to ``d`` on a negative edge of the clock signal. Verilator does this by pushing the combinational logic into the same (possibly multiple) event domains as the logic driving the inputs to that combinational logic, and only evaluating the combinational logic if at least -one driving domains have been triggered. The impact of this activity gating is +one driving domain has been triggered. The impact of this activity gating is very high (observed 100x slowdown on large designs when turning it off), it is the reason we prefer to convert clocked logic to combinational logic in ``V3Active`` whenever possible. -The ordering procedure described above works straight forward unless there are +The ordering procedure described above works straightforward unless there are combinational logic constructs that are circularly dependent (a.k.a.: the UNOPTFLAT warning). Combinational scheduling loops can arise in sound (realizable) circuits as Verilator considers each SystemVerilog process as a @@ -369,7 +369,7 @@ To achieve this, ``V3Sched::schedule`` calls ``V3Sched::breakCycles``, which builds a dependency graph of all combinational logic in the design, and then breaks all combinational cycles by converting all combinational logic that consumes a variable driven via a 'back-edge' into hybrid logic. Here -'back-edge' just means a graph edge that points from a higher rank vertex to a +'back-edge' just means a graph edge that points from a higher-rank vertex to a lower rank vertex in some consistent ranking of the directed graph. Variables driven via a back-edge in the dependency graph are marked, and all combinational logic that depends on such variables is converted into hybrid @@ -382,7 +382,7 @@ logic, with two exceptions: - Explicit sensitivities of hybrid logic are ignored for the purposes of data-flow ordering with respect to other combinational or hybrid logic. I.e.: an explicit sensitivity suppresses the implicit sensitivity on the same - variable. This cold also be interpreted as ordering the hybrid logic as if + variable. This could also be interpreted as ordering the hybrid logic as if all variables listed as explicit sensitivities were substituted as constants with their current values. @@ -396,7 +396,7 @@ explicit sensitivities are triggered. The effect of this transformation is that ``V3Order`` can proceed as if there are no combinational cycles (or alternatively, under the assumption that the -back-edge driven variables don't change during one evaluation pass). The +back-edge-driven variables don't change during one evaluation pass). The evaluation loop invoking the ordered code, will then re-invoke it on a follow on iteration, if any of the explicit sensitivities of hybrid logic have actually changed due to the previous invocation, iterating until all the @@ -422,8 +422,8 @@ combinationally driven variables are consistent with the combinational logic. To achieve this, we invoke ``V3Order::order`` on all of the combinational and hybrid logic, and iterate the resulting evaluation function until no more -hybrid logic is triggered. This yields the `_eval_settle` function which is -invoked at the beginning of simulation, after the `_eval_initial`. +hybrid logic is triggered. This yields the `_eval_settle` function, which is +invoked at the beginning of simulation after the `_eval_initial`. Partitioning logic for correct NBA updates @@ -432,17 +432,17 @@ Partitioning logic for correct NBA updates ``V3Order`` can order logic corresponding to non-blocking assignments (NBAs) to yield correct simulation results, as long as all the sensitivity expressions of clocked logic triggered in the Active scheduling region of the current time -step are known up front. I.e.: the ordering of NBA updates is only correct if +step are known up front. I.e., the ordering of NBA updates is only correct if derived clocks that are computed in an Active region update (that is, via a blocking or continuous assignment) are known up front. We can ensure this by partitioning the logic into two regions. Note these -regions are a concept of the Verilator scheduling algorithm and they do not +regions are a concept of the Verilator scheduling algorithm, and they do not directly correspond to the similarly named SystemVerilog scheduling regions as defined in the standard: - All logic (clocked, combinational and hybrid) that transitively feeds into, - or drives, via a non-blocking or continuous assignments (or via any update + or drives via a non-blocking or continuous assignments (or via any update that SystemVerilog executes in the Active scheduling region), a variable that is used in the explicit sensitivity list of some clocked or hybrid logic, is assigned to the 'act' region. @@ -450,10 +450,10 @@ as defined in the standard: - All other logic is assigned to the 'nba' region. For completeness, note that a subset of the 'act' region logic, specifically, -the logic related to the pre-assignments of NBA updates (i.e.: AstAssignPre +the logic related to the pre-assignments of NBA updates (i.e., AstAssignPre nodes), is handled separately, but is executed as part of the 'act' region. -Also note that all logic representing the committing of an NBA (i.e.: Ast*Post) +Also note that all logic representing the committing of an NBA (i.e., Ast*Post) nodes) will be in the 'nba' region. This means that the evaluation of the 'act' region logic will not commit any NBA updates. As a result, the 'act' region logic can be iterated to compute all derived clock signals up front. @@ -462,7 +462,7 @@ The correspondence between the SystemVerilog Active and NBA scheduling regions, and the internal 'act' and 'nba' regions, is that 'act' contains all Active region logic that can compute a clock signal, while 'nba' contains all other Active and NBA region logic. For example, if the only clocks in the design are -top level inputs, then 'act' will be empty, and 'nba' will contain the whole of +top-level inputs, then 'act' will be empty, and 'nba' will contain the whole of the design. The partitioning described above is performed by ``V3Sched::partition``. @@ -475,10 +475,10 @@ We will separately invoke ``V3Order::order`` on the 'act' and 'nba' region logic. Combinational logic that reads variables driven from both 'act' and 'nba' -region logic has the problem of needing to be re-evaluated even if only one of +region logic has the problem of needing to be reevaluated even if only one of the regions updates an input variable. We could pass additional trigger expressions between the regions to make sure combinational logic is always -re-evaluated, or we can replicate combinational logic that is driven from +reevaluated, or we can replicate combinational logic that is driven from multiple regions, by copying it into each region that drives it. Experiments show this simple replication works well performance-wise (and notably ``V3Combine`` is good at combining the replicated code), so this is what we do @@ -506,7 +506,7 @@ the top level `_eval` function, which on the high level has the form: :: void _eval() { - // Update combinational logic dependent on top level inptus ('ico' region) + // Update combinational logic dependent on top level inputs ('ico' region) while (true) { _eval__triggers__ico(); // If no 'ico' region trigger is active @@ -534,7 +534,7 @@ the top level `_eval` function, which on the high level has the form: // If no 'nba' region trigger is active if (!nba_triggers.any()) break; - // Evaluate all other Active region logic, and commti NBAs + // Evaluate all other Active region logic, and commit NBAs _eval_nba(); } } @@ -628,7 +628,7 @@ coroutines ``co_await`` its ``join`` function, and forked ones call ``done`` when they're finished. Once the required number of coroutines (set using ``setCounter``) finish execution, the forking coroutine is resumed. -Awaitable utilities +Awaitable Utilities ^^^^^^^^^^^^^^^^^^^ There are also two small utility awaitable types: @@ -639,7 +639,7 @@ There are also two small utility awaitable types: * ``VlForever`` is used for blocking a coroutine forever. See the `Timing pass` section for more detail. -Timing pass +Timing Pass ^^^^^^^^^^^ The visitor in ``V3Timing.cpp`` transforms each timing control into a ``co_await``. @@ -668,7 +668,7 @@ before them and stored in temporary variables. and then await changes in variables used in the condition. If the condition is always false, the ``wait`` statement is replaced by a ``co_await`` on a ``VlForever``. This is done instead of a return in case the ``wait`` is deep in -a call stack (otherwise the coroutine's caller would continue execution). +a call stack (otherwise, the coroutine's caller would continue execution). Each sub-statement of a ``fork`` is put in an ``AstBegin`` node for easier grouping. In a later step, each of these gets transformed into a new, separate @@ -748,7 +748,7 @@ doesn't suspend the forking process. In forked processes, references to local variables are only allowed in ``fork..join``, as this is the only case that ensures the lifetime of these -locals is at least as long as the execution of the forked processes. This is +locals are at least as long as the execution of the forked processes. This is where ``VlNow`` is used, to ensure the locals are moved to the heap before they are passed by reference to the forked processes. @@ -770,7 +770,7 @@ graph, while maintaining as much available parallelism as possible. Often the partitioner can transform an input graph with millions of nodes into a coarsened execution graph with a few dozen nodes, while maintaining enough parallelism to take advantage of a modern multicore CPU. Runtime -synchronization cost is not prohibitive with so few nodes. +synchronization cost is reasonable with so few nodes. Partitioning @@ -789,7 +789,7 @@ The available parallelism or "par-factor" of a DAG is the total cost to execute all nodes, divided by the cost to execute the longest critical path through the graph. This is the speedup you would get from running the graph in parallel, if given infinite CPU cores available and communication and -synchronization are zero. +synchronization is zero. Macro Task @@ -847,7 +847,7 @@ synchronization costs. Verilator's cost estimates are assigned by ``InstrCountVisitor``. This class is perhaps the most fragile piece of the multithread implementation. It's easy to have a bug where you count something cheap -(eg. accessing one element of a huge array) as if it were expensive (eg. +(e.g. accessing one element of a huge array) as if it were expensive (eg. by counting it as if it were an access to the entire array.) Even without such gross bugs, the estimates this produce are only loosely predictive of actual runtime cost. Multithread performance would be better with better @@ -879,13 +879,13 @@ fragmentation. Locating Variables for Best Spatial Locality ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -After scheduling all code, we attempt to locate variables in memory such +After scheduling all code, we attempt to locate variables in memory, such that variables accessed by a single macro-task are close together in memory. This provides "spatial locality" - when we pull in a 64-byte cache line to access a 2-byte variable, we want the other 62 bytes to be ones we'll also likely access soon, for best cache performance. -This turns out to be critical for performance. It should allow Verilator +This is critical for performance. It should allow Verilator to scale to very large models. We don't rely on our working set fitting in any CPU cache; instead we essentially "stream" data into caches from memory. It's not literally streaming, where the address increases @@ -904,7 +904,7 @@ The footprint ordering is literally the traveling salesman problem, and we use a TSP-approximation algorithm to get close to an optimal sort. This is an old idea. Simulators designed at DEC in the early 1990s used -similar techniques to optimize both single-thread and multi-thread +similar techniques to optimize both single-thread and multithread modes. (Verilator does not optimize variable placement for spatial locality in serial mode; that is a possible area for improvement.) @@ -918,7 +918,7 @@ Wave Scheduling To allow the Verilated model to run in parallel with the testbench, it might be nice to support "wave" scheduling, in which work on a cycle begins -before ``eval()`` is called or continues after ``eval()`` returns. For now +before ``eval()`` is called or continues after ``eval()`` returns. For now, all work on a cycle happens during the ``eval()`` call, leaving Verilator's threads idle while the testbench (everything outside ``eval()``) is working. This would involve fundamental changes within the partitioner, @@ -929,7 +929,7 @@ Efficient Dynamic Scheduling """""""""""""""""""""""""""" To scale to more than a few threads, we may revisit a fully dynamic -scheduler. For large (>16 core) systems it might make sense to dedicate an +scheduler. For large (>16 core) systems, it might make sense to dedicate an entire core to scheduling, so that scheduler data structures would fit in its L1 cache and thus the cost of traversing priority-ordered ready lists would not be prohibitive. @@ -983,7 +983,7 @@ Performance Regression """""""""""""""""""""" It would be nice if we had a regression of large designs, with some -diversity of design styles, to test on both single- and multi-threaded +diversity of design styles, to test on both single- and multithreaded modes. This would help to avoid performance regressions, and also to evaluate the optimizations while minimizing the impact of parasitic noise. @@ -992,7 +992,7 @@ Per-Instance Classes """""""""""""""""""" If we have multiple instances of the same module, and they partition -differently (likely; we make no attempt to partition them the same) then +differently (likely; we make no attempt to partition them the same), then the variable sort will be suboptimal for either instance. A possible improvement would be to emit an unique class for each instance of a module, and sort its variables optimally for that instance's code stream. @@ -1011,17 +1011,17 @@ until all signals are stable. On other evaluations, the Verilated code detects what input signals have changes. If any are clocks, it calls the appropriate sequential functions (from ``always @ posedge`` statements). Interspersed with sequential -functions it calls combo functions (from ``always @*``). After this is +functions, it calls combo functions (from ``always @*``). After this is complete, it detects any changes due to combo loops or internally generated clocks, and if one is found must reevaluate the model again. For SystemC code, the ``eval()`` function is wrapped in a SystemC -``SC_METHOD``, sensitive to all inputs. (Ideally it would only be sensitive +``SC_METHOD``, sensitive to all inputs. (Ideally, it would only be sensitive to clocks and combo inputs, but tracing requires all signals to cause evaluation, and the performance difference is small.) If tracing is enabled, a callback examines all variables in the design for -changes, and writes the trace for each change. To accelerate this process +changes, and writes the trace for each change. To accelerate this process, the evaluation process records a bitmask of variables that might have changed; if clear, checking those signals for changes may be skipped. @@ -1045,7 +1045,7 @@ is appreciated if you could match our style: - Use "mixedCapsSymbols" instead of "underlined_symbols". -- Uas a "p" suffix on variables that are pointers, e.g. "nodep". +- Use a "p" suffix on variables that are pointers, e.g., "nodep". - Comment every member variable. @@ -1057,12 +1057,12 @@ using clang-format version 10.0.0, and yapf for python, and is automatically corrected in the CI actions. For those manually formatting C code: -- Use 4 spaces per level, and no tabs. +- Use four spaces per level, and no tabs. -- Use 2 spaces between the end of source and the beginning of a +- Use two spaces between the end of source and the beginning of a comment. -- Use 1 space after if/for/switch/while and similar keywords. +- Use one space after if/for/switch/while and similar keywords. - No spaces before semicolons, nor between a function's name and open parenthesis (only applies to functions; if/else has a following space). @@ -1073,8 +1073,8 @@ The ``astgen`` Script The ``astgen`` script is used to generate some of the repetitive C++ code related to the ``AstNode`` type hierarchy. An example is the abstract ``visit`` -methods in ``VNVisitor``. There are other uses, please see the ``*__gen*`` -files in the bulid directories and the ``astgen`` script itself for details. A +methods in ``VNVisitor``. There are other uses; please see the ``*__gen*`` +files in the bulid directories and the ``astgen`` script for details. A description of the more advanced features of ``astgen`` are provided here. @@ -1099,7 +1099,7 @@ sub-class definitions are parsed and contribute to the code generated by ``astgen``. The general syntax is ``@astgen := ``, where ```` determines what is being defined, and ```` is a ```` dependent description of the definition. The list of -``@astgen`` directives is as follows: +``@astgen`` directives are as follows: ``op`` operand directives @@ -1128,7 +1128,7 @@ An example of the full syntax of the directive is ``astnode`` generates accessors for the child nodes based on these directives. For non-list children, the names of the getter and setter both are that of the -given ````. For list type children, the getter is ````, +given ````. For list-type children, the getter is ````, and instead of the setter, there an ``add`` method is generated that appends new nodes (or lists of nodes) to the child list. @@ -1185,10 +1185,10 @@ and applies the visit method of the ``VNVisitor`` to the invoking AstNode instance (i.e. ``this``). One possible difficulty is that a call to ``accept`` may perform an edit -which destroys the node it receives as argument. The +which destroys the node it receives as an argument. The ``acceptSubtreeReturnEdits`` method of ``AstNode`` is provided to apply ``accept`` and return the resulting node, even if the original node is -destroyed (if it is not destroyed it will just return the original node). +destroyed (if it is not destroyed, it will just return the original node). The behavior of the visitor classes is achieved by overloading the ``visit`` function for the different ``AstNode`` derived classes. If a @@ -1212,7 +1212,7 @@ There are three ways data is passed between visitor functions. it's cleared. Children under an ``AstModule`` will see it set, while nodes elsewhere will see it clear. If there can be nested items (for example an ``AstFor`` under an ``AstFor``) the variable needs to be - save-set-restored in the ``AstFor`` visitor, otherwise exiting the + save-set-restored in the ``AstFor`` visitor; otherwise exiting the lower for will lose the upper for's setting. 2. User attributes. Each ``AstNode`` (**Note.** The AST node, not the @@ -1243,14 +1243,14 @@ There are three ways data is passed between visitor functions. These comments are important to make sure a ``user#()`` on a given ``AstNode`` type is never being used for two different purposes. - Note that calling ``user#ClearTree`` is fast, it doesn't walk the + Note that calling ``user#ClearTree`` is fast; it doesn't walk the tree, so it's ok to call fairly often. For example, it's commonly called on every module. 3. Parameters can be passed between the visitors in close to the "normal" function caller to callee way. This is the second ``vup`` parameter of type ``AstNUser`` that is ignored on most of the visitor - functions. V3Width does this, but it proved more messy than the above + functions. V3Width does this, but it proved messier than the above and is deprecated. (V3Width was nearly the first module written. Someday this scheme may be removed, as it slows the program down to have to pass vup everywhere.) @@ -1305,7 +1305,7 @@ change. For example: iterateAndNextNull(nodep->lhsp()); Will work fine, as even if the first iterate causes a new node to take -the place of the ``lhsp()``, that edit will update ``nodep->lhsp()`` and +the place of the ``lhsp()``, that edit will update ``nodep->lhsp()``, and the second call will correctly see the change. Alternatively: :: @@ -1318,8 +1318,8 @@ the second call will correctly see the change. Alternatively: This will cause bugs or a core dump, as lp is a dangling pointer. Thus it is advisable to set lhsp=NULL shown in the \*'s above to make sure -these dangles are avoided. Another alternative used in special cases -mostly in V3Width is to use acceptSubtreeReturnEdits, which operates on +these dangles are avoided. Another alternative used in special cases, +mostly in V3Width, is to use acceptSubtreeReturnEdits, which operates on a single node and returns the new pointer if any. Note acceptSubtreeReturnEdits does not follow ``nextp()`` links. @@ -1332,7 +1332,7 @@ Identifying Derived Classes --------------------------- A common requirement is to identify the specific ``AstNode`` class we -are dealing with. For example a visitor might not implement separate +are dealing with. For example, a visitor might not implement separate ``visit`` methods for ``AstIf`` and ``AstGenIf``, but just a single method for the base class: @@ -1355,7 +1355,7 @@ use: Additionally the ``VN_CAST`` method converts pointers similar to C++ ``dynamic_cast``. This either returns a pointer to the object cast to that type (if it is of class ``SOMETYPE``, or a derived class of -``SOMETYPE``) or else NULL. (However, for true/false tests use ``VN_IS`` +``SOMETYPE``) or else NULL. (However, for true/false tests, use ``VN_IS`` as that is faster.) @@ -1364,13 +1364,13 @@ as that is faster.) Testing ======= -For an overview of how to write a test see the BUGS section of the +For an overview of how to write a test, see the BUGS section of the `Verilator Manual `_. It is important to add tests for failures as well as success (for example to check that an error message is correctly triggered). -Tests that fail should by convention have the suffix ``_bad`` in their +Tests that fail should, by convention have the suffix ``_bad`` in their name, and include ``fails = 1`` in either their ``compile`` or ``execute`` step as appropriate. @@ -1378,11 +1378,11 @@ name, and include ``fails = 1`` in either their ``compile`` or Preparing to Run Tests ---------------------- -For all tests to pass you must install the following packages: +For all tests to pass, you must install the following packages: - SystemC to compile the SystemC outputs, see http://systemc.org -- Parallel::Forker from CPAN to run tests in parallel, you can install +- Parallel::Forker from CPAN to run tests in parallel; you can install this with e.g. "sudo cpan install Parallel::Forker". - vcddiff to find differences in VCD outputs. See the readme at @@ -1417,9 +1417,9 @@ This can be changed using the ``top_filename`` subroutine, for example top_filename("t/t_myothertest.v"); -By default all tests will run with major simulators (Icarus Verilog, NC, -VCS, ModelSim, etc) as well as Verilator, to allow results to be -compared. However if you wish a test only to be used with Verilator, you +By default, all tests will run with major simulators (Icarus Verilog, NC, +VCS, ModelSim, etc.) as well as Verilator, to allow results to be +compared. However, if you wish a test only to be used with Verilator, you can use the following: :: @@ -1435,7 +1435,7 @@ Of the many options that can be set through arguments to ``compiler`` and ``fails`` Set to 1 to indicate that the compilation or execution is intended to fail. -For example the following would specify that compilation requires two +For example, the following would specify that compilation requires two defines and is expected to fail. :: @@ -1452,15 +1452,15 @@ Regression Testing for Developers Developers will also want to call ./configure with two extra flags: ``--enable-ccwarn`` - Causes the build to stop on warnings as well as errors. A good way to - ensure no sloppy code gets added, however it can be painful when it + This causes the build to stop on warnings as well as errors. A good way + to ensure no sloppy code gets added; however it can be painful when it comes to testing, since third party code used in the tests (e.g. SystemC) may not be warning free. ``--enable-longtests`` In addition to the standard C, SystemC examples, also run the tests in the ``test_regress`` directory when using *make test*'. This is - disabled by default as SystemC installation problems would otherwise + disabled by default, as SystemC installation problems would otherwise falsely indicate a Verilator problem. When enabling the long tests, some additional PERL modules are needed, @@ -1477,7 +1477,7 @@ There are some traps to avoid when running regression tests - Not all Linux systems install Perldoc by default. This is needed for the ``--help`` option to Verilator, and also for regression testing. This - can be installed using cpan: + can be installed using CPAN: :: @@ -1489,8 +1489,8 @@ There are some traps to avoid when running regression tests - Running regression may exhaust resources on some Linux systems, particularly file handles and user processes. Increase these to - respectively 16,384 and 4,096. The method of doing this is system - dependent, but on Fedora Linux it would require editing the + respectively 16,384 and 4,096. The method of doing this is + system-dependent, but on Fedora Linux it would require editing the ``/etc/security/limits.conf`` file as root. @@ -1510,7 +1510,7 @@ Continuous Integration Verilator uses GitHub Actions which automatically tests the master branch for test failures on new commits. It also runs a daily cron job to validate -all of the tests against different OS and compiler versions. +all tests against different OS and compiler versions. Developers can enable Actions on their GitHub repository so that the CI environment can check their branches too by enabling the build workflow: @@ -1555,7 +1555,7 @@ debug level 5, with the V3Width.cpp file at level 9. --debug ------- -When you run with ``--debug`` there are two primary output file types +When you run with ``--debug``, there are two primary output file types placed into the obj_dir, .tree and .dot files. @@ -1572,7 +1572,7 @@ output, for example: dot -Tps -o ~/a.ps obj_dir/Vtop_foo.dot You can then print a.ps. You may prefer gif format, which doesn't get -scaled so can be more useful with large graphs. +scaled so it can be more useful with large graphs. For interactive graph viewing consider `xdot `__ or `ZGRViewer @@ -1617,21 +1617,21 @@ field in the section below. +---------------+--------------------------------------------------------+ | ``w32`` | The data-type width() is 32 bits. | +---------------+--------------------------------------------------------+ -| ``out_wide`` | The name() of the node, in this case the name of the | +| ``out_wide`` | The name() of the node, in this case, the name of the | | | variable. | +---------------+--------------------------------------------------------+ | ``[O]`` | Flags which vary with the type of node, in this | -| | case it means the variable is an output. | +| | case, it means the variable is an output. | +---------------+--------------------------------------------------------+ -In more detail the following fields are dumped common to all nodes. They +In more detail, the following fields are dumped common to all nodes. They are produced by the ``AstNode::dump()`` method: Tree Hierarchy The dump lines begin with numbers and colons to indicate the child node hierarchy. As noted above, ``AstNode`` has lists of items at the same level in the AST, connected by the ``nextp()`` and ``prevp()`` - pointers. These appear as nodes at the same level. For example after + pointers. These appear as nodes at the same level. For example, after inlining: :: @@ -1655,20 +1655,20 @@ Address of the node with the debugger. If the actual address values are not important, then using the ``--dump-tree-addrids`` option will convert address values to short identifiers of the form ``([A-Z]*)``, which is - hopefully easier for the reader to cross reference throughout the + hopefully easier for the reader to cross-reference throughout the dump. Last edit number Of the form ```` or ```` , where ``nnnn`` is the number of the last edit to modify this node. The trailing ``#`` - indicates the node has been edited since the last tree dump (which - typically means in the last refinement or optimization pass). GDB can - watch for this, see << /Debugging >>. + indicates the node has been edited since the last tree dump + (typically in the last refinement or optimization pass). GDB can + watch for this; see << /Debugging >>. Source file and line Of the form ``{xxnnnn}``, where C{xx} is the filename letter (or letters) and ``nnnn`` is the line number within that file. The first - file is ``a``, the 26th is ``z``, the 27th is ``aa`` and so on. + file is ``a``, the 26th is ``z``, the 27th is ``aa``, and so on. User pointers Shows the value of the node's user1p...user5p, if non-NULL. @@ -1683,7 +1683,7 @@ Data type - ``s`` if the node is signed. - - ``d`` if the node is a double (i.e a floating point entity). + - ``d`` if the node is a double (i.e. a floating point entity). - ``w`` always present, indicating this is the width field. @@ -1693,9 +1693,9 @@ Data type width. Name of the entity represented by the node if it exists - For example for a ``VAR`` it is the name of the variable. + For example, for a ``VAR`` is the name of the variable. -Many nodes follow these fields with additional node specific +Many nodes follow these fields with additional node-specific information. Thus the ``VARREF`` node will print either ``[LV]`` or ``[RV]`` to indicate a left value or right value, followed by the node of the variable being referred to. For example: @@ -1710,7 +1710,7 @@ type in question to determine additional fields that may be printed. The ``MODULE`` has a list of ``CELLINLINE`` nodes referred to by its ``op1p()`` pointer, connected by ``nextp()`` and ``prevp()`` pointers. -Similarly the ``NETLIST`` has a list of modules referred to by its +Similarly, the ``NETLIST`` has a list of modules referred to by its ``op1p()`` pointer. @@ -1728,7 +1728,7 @@ Debugging with GDB ------------------ The test_regress/driver.pl script accepts ``--debug --gdb`` to start -Verilator under gdb and break when an error is hit or the program is about +Verilator under gdb and break when an error is hit, or the program is about to exit. You can also use ``--debug --gdbbt`` to just backtrace and then exit gdb. To debug the Verilated executable, use ``--gdbsim``. @@ -1805,7 +1805,7 @@ backtrace. You will typically see a frame sequence something like: Adding a New Feature ==================== -Generally what would you do to add a new feature? +Generally, what would you do to add a new feature? 1. File an issue (if there isn't already) so others know what you're working on. @@ -1823,7 +1823,7 @@ Generally what would you do to add a new feature? Ordering of definitions is enforced by ``astgen``. 5. Now you can run "test_regress/t/t_.pl --debug" and it'll - probably fail but you'll see a + probably fail, but you'll see a "test_regress/obj_dir/t_/*.tree" file which you can examine to see if the parsing worked. See also the sections above on debugging. @@ -1833,12 +1833,12 @@ Generally what would you do to add a new feature? Adding a New Pass ----------------- -For more substantial changes you may need to add a new pass. The simplest +For more substantial changes, you may need to add a new pass. The simplest way to do this is to copy the ``.cpp`` and ``.h`` files from an existing pass. You'll need to add a call into your pass from the ``process()`` function in ``src/verilator.cpp``. -To get your pass to build you'll need to add its binary filename to the +To get your pass to build, you'll need to add its binary filename to the list in ``src/Makefile_obj.in`` and reconfigure. @@ -1854,11 +1854,9 @@ IEEE 1800-2017 3.3 modules within modules IEEE 1800-2017 6.12 "shortreal" Little/no tool support, and easily promoted to real. IEEE 1800-2017 11.11 Min, typ, max - No SDF support so will always use typical. + No SDF support, so will always use typical. IEEE 1800-2017 11.12 "let" - Little/no tool support, makes difficult to implement parsers. -IEEE 1800-2017 20.15 Probabilistic functions - Little industry use. + Little/no tool support, makes it difficult to implement parsers. IEEE 1800-2017 20.16 Stochastic analysis Little industry use. IEEE 1800-2017 20.17 PLA modeling