A concat typically has multiple inputs. Whenever one of the input values
change the output value of the concat is updated and propagated to its
downstream consumers.
When multiple inputs change within the same cycle each input will cause a
update propagation. Depending of the overall structure of the design this
can cause a significant performance penalty.
E.g. the following synthetic structure has a exponential runtime increase
based on the value of N.
```
reg [N-1:0] x;
generate for (genvar i = 0; i < N - 1; i++)
assign x[i+1] = ^{x[i],x[i]};
endgenerate
```
To improve this defer the value propagation of the concat to the end of the
current cycle, this allows multiple input updates to be included in a
single output update.
For the example in report #1052 this reduced the runtime from 2 minutes to
essentially 0.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The implementation for partial receive for concat only differs from the
regular receive in that it takes an additional offset.
The regular receive can easily be implemented by calling the partial
receive with an offset of 0. This allows to remove some duplicated code.
The overhead of this is negligible, but to help the compiler to optimize this
a bit better mark the `recv_vec()` and `recv_vec_pv()` functions as final.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
vvp_net_ptr_t uses vvp_sub_pointer_t to implement a tagged pointer with the
tag containing the port number.
The size of the tagged pointer is that of a normal pointer and could easily
be passed in a register when passing it as an argument to a function.
But since the vvp_sub_pointer_t type has a non-standard destructor it is
instead passed on the stack with the register containing a pointer to the
stack location where the value is stored.
This creates extra boiler plate code when passing a vvp_net_ptr_t to a
function writing and reading the value to and from the stack.
Use the default destructor for vvp_sub_pointer_t to avoid this and have the
value passed in a register.
There isn't much of a performance gain but the change is simple enough to
do anyway.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The repeat functor can receive a partial vector. Make sure this is handled.
Since the expectation is that will only happen if the input wire is driven
by a single partial selection the default recv_vec4_pv_() can be used which
replaces the missing bits by `z`.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Automatic 2-state vectors currently get initialized to 'hx, while their
default value should be 0.
Make sure the vector is initialized to 0 at the beginning of the automatic
context.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The `recv_vec{4,8}_pv()` functions are used to implement a partial write to
a vector. As parameters they take both the value and the width of the
value.
All callers of of these functions pass `val.size()` or a variation thereof
as the width of the value. And all implementations that do anything with
the data have an assert that `val.size() == wid`.
Remove the `wid` parameter from these functions and just use `val.size()`
directly where needed. This allows to simplify the interface and also
to remove the asserts.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Most vvp functors need to support recv_vec4_pv. Any that are strength-aware
also need to support recv_vec8_pv. Note the simplifying assumption that is
documented in the base class recv_vec4_pv_ implementation.
"# include <string>" was added so "Microsoft Visual Studio Express 2015 RC Web" could compile it without error. "static void operator delete[](void*); was preprocessed so "Microsoft Visual Studio Express 2015 RC Web" could link it without error for a function not yet implemented.
- Have %pushi/vec4 handle some special cases optimally.
- Eliminate some duplicated method calls in %load/vec4.
- Optimize the vvp_vector4_t::copy_from_ method by inlining
some parts.
The vvp_vector2_t constructor that takes a vvp_vector4_t value was
documented as creating a NaN value if the supplied vector contained
any X or Z bits, but instead used the standard Verilog 4-state to
2-state conversion semantics (X or Z translate to 0). I've added an
optional second parameter to the constructor to allow the user to
choose which semantics they want, as both are needed.
If a strength aware net has an unambiguous HiZ1 strength, VVP treats
it as a logic '1'. It should be treated as a logic 'z'. An ambiguous
HiZ1/HiZ0 strength should also be treated as a logic 'z'.
The vvp_darray_real class cal be used for static arrays as well
and this is a more general solution anyhow. Kill the now useless
vvp_realarray_t class.
Clean up the vector4_to_value to use templates and explicit
instantiations. This makes the interface much cleaner for a
wider variety of integral types.
This patch implements the $countdrivers system function. It does not
yet support wires connected to islands (and outputs a suitable "sorry"
message when this is detected).
In vvp, create the .var/str variable for representing strings, and
handle strings in the $display system task.
Add to vvp threads the concept of a stack of strings. This is going to
be how complex objects are to me handled in the future: forth-like
operation stacks. Also add the first two instructions to minimally get
strings to work.
In the parser, handle the variable declaration and make it available
to the ivl_target.h code generator. The vvp code generator can use this
information to generate the code for new vvp support.
Full vector assigns are able to short circuit the propagation of
the value if it finds that there are no value changes. This patch
supports that behavior in writes to parts as well. Put this change
to use in logic devices as well.
This patch makes the code consistently use struct/class in the C++ files,
it removes a couple shadow warnings and where a class pointer is passed to
the C routines, it defines the pointer as a class for C++ and as struct for
C and it removes a namespace std duplication.