/*
 * Copyright (c) 2001-2013 Stephen Williams (steve@icarus.com)
 *
 */


EXECUTABLE INSTRUCTION OPCODES

Instruction opcodes all start with a % character and have 0 or more
operands. In no case are there more than 3 operands. This chapter
describes the specific behavior of each opcode, in enough detail
(I hope) that its complete effect can be predicted.

General Principles of Arithmetic (current plan):

The binary arithmetic instruction in general takes three parameters,
the left operand, the right operand, and the base. The left operand is
replaced with the result, which is the same width as the left and
right operands.

General Principles of Arithmetic (new plan):

For strings, all arithmetic is stack based. That is, there is an
abstract stack of strings from which operations pull their operands
and push their results. This is somewhat like FORTH (or an HP calculator
RPN notation) and spares the need to keep register addresses in
operands. I may find this leads to a more compact form of instruction
code, and may lead to more efficient operators overall, and in
particular I may find improved efficiency overall; so after the
experience of implementing it for strings, I'll want to change other
types around to using this method as well. Keep this in mind whenever
considering adding new instructions to vvp.

FLAGS

There are up to 16 bits in each thread that are available for
flags. These are used as destinations for operations that return
boolean values, for example comparisons. They are also used as inputs
for test and branch opcodes.

* %abs/wr <bit-o>, <bit-i>

This instruction calculates the absolute value of a real value. It uses
the fabs() function in the run-time to do the work.

* %add <bit-l>, <bit-r>, <wid> (XXXX Old version)

This instruction adds the right vector into the left vector, the
vectors having the width <wid>. If any of the bits of either vector
are x or z, the result is x. Otherwise, the result is the arithmetic
sum.

See also the %sub instruction.

* %add

This opcode pops and adds two vec4 values from the vec4 stack, adds
them, and pushes the result back to the stack. The input values must
have the same size, and the pushed result will have the same width.

See also the %sub instruction.

* %add/wr <bit-l>, <bit-r>

This is the real valued version of the %add instruction. The arguments
are popped from the stack, right operand then left, and the result
pushed in place

See also the %sub/wr instruction.


* %addi <bit-l>, <imm>, <wid>

This instruction adds the immediate value (no x or z bits) into the
left vector. The imm value is limited to 16 significant bits, but it
is zero extended to match any width.

* %alloc <scope-label>

This instruction allocates the storage for a new instance of an
automatically allocated scope.

* %and

Perform the bitwise AND of the two vectors popped from the vec4 stack,
and push the result. Each bit is calculated independent of other
bits. AND means the following:

	0 and ? --> 0
	? and 0 --> 0
	1 and 1 --> 1
	otherwise   x

The input vectors must be the same width, and the output vector will
be the width of the input.

* %and/r

Pop the top value from the vec4 stack, perform a reduction &, then
return the single-bit result.

* %assign/ar <array-label>, <delay>
* %assign/ar/d <array-label>, <delayx>
* %assign/ar/e <array-label>

The %assign/ar instruction assigns a real value to a word in the
labeled real array. The <delay> is the delay in simulation time to
the assignment (0 for non-blocking assignment) and the value is popped
from the real value stack.

The memory word address is read from index register 3. The address is
in canonical form.

The %assign/ar/d variation reads the delay from an integer register that
is given by the <delayx> value. This should not be 3 or the <bit> index,
of course, since these registers contain the word address and the value.

The %assign/ar/e variation uses the information in the thread
event control registers to determine when to perform the assign.
%evctl is used to set the event control information.

* %assign/av <array-label>, <delay>, <bit> (XXXX Old definition)
* %assign/av/d <array-label>, <delayx>, <bit> (XXXX Old definition)
* %assign/av/e <array-label>, <bit> (XXXX Old definition)

The %assign/av instruction assigns a vector value to a word in the
labeled array. The <delay> is the delay in simulation time to the
assignment (0 for non-blocking assignment) and the <bit> is the base
of the vector to write.

The width of the vector is retrieved from index register 0.

The base of a part select is retrieved from index register 1.

The address of the word in the memory is from index register 3. The
address is canonical form.

The %assign/av/d variation reads the delay from an integer register that
is given by the <delayx> value. This should not be 0, 1 or 3, of course,
since these registers contain the vector width, base part select and
word address.

The %assign/av/e variation uses the information in the thread
event control registers to determine when to perform the assign.
%evctl is used to set the event control information.

* %assign/v0 <var-label>, <delay>, <bit> (XXXX Old description)
* %assign/v0/d <var-label>, <delayx>, <bit> (XXXX Old description
* %assign/v0/e <var-label>, <bit> (XXXX Old description)

The %assign/v0 instruction is a vector version of non-blocking
assignment. The <delay> is the number of clock ticks in the future
where the assignment should be schedule, and the <bit> is the base of
the vector to be assigned to the destination. The vector width is in
index register 0.

The %assign/v0/d variation gets the delay instead from an integer
register that is given by the <delayx> value. This should not be 0, of
course, because integer 0 is taken with the vector width.

The %assign/v0/e variation uses the information in the thread
event control registers to determine when to perform the assign.
%evctl is used to set the event control information.

The <var-label> references a .var object that can receive non-blocking
assignments. For blocking assignments, see %set/v.

* %assign/v0/x1 <var-label>, <delay>, <bit>
* %assign/v0/x1/d <var-label>, <delayx>, <bit>
* %assign/v0/x1/e <var-label>, <bit>

This is similar to the %assign/v0 instruction, but adds the index-1
index register with the canonical index of the destination where the
vector is to be written. This allows for part writes into the vector.

* %assign/vec4 <var-label>, <delay>
* %assign/vec4/d <var-label>, <delayx>
* %assign/vec4/e <var-label>

The %assign/vec4 instruction if a vec4 version of non-blocking
assignment, The <delay> is the number lf clock ticks in the future
where the assignment should schedule, and the value to assign is
pulled from the vec4 stack.

The %assign/vec4/d instruction is the same, but gets its delay value
from the index register <delayx> instead.

* %assign/vec4/a/d <var-label>, <off-index>, <delay-index>
* %assign/vec4/a/e <var-label>, <off-index>

This instruction implements delayed assignment to an array word. The
value is popped from the vec4 stack; the width is taken from the
popped value. The <off-index> index register contains the canonical
offset into the memory word for a part select, and the <delay-index>
index register contains the delay for the assignment. Index register 3
contains the word address.

The <off-index> and <delay-index> index registers can be 0, which
means a zero value instead of the contents of index register 0.

* %assign/vec4/off/d <var-label>, <off-index>, <delay-index>

This is for writing parts to the target variable. The <var-label> is
the variable to write, as usual. The <off-index> selects an index
register that holds the offset into the target variable, and the
<delay-index> selects the index register that contains the delay. The
offset is in canonical bits. The width that is written is taken from
the width of the value on the stack.

* %assign/wr <vpi-label>, <delay>
* %assign/wr/d <vpi-label>, <delayx>
* %assign/wr/e <vpi-label>

This instruction provides a non-blocking assign of the real value
given in <index> to the real object addressed by the <vpi-label>
label after the given <delay>. The real value is popped from the stack.

The %assign/wr/d variation gets the delay from integer register
<delayx>.

The %assign/wr/e variation uses the information in the thread
event control registers to determine when to perform the assign.
%evctl is used to set the event control information.

* %assign/x0 <var-label>, <delay>, <bit> (OBSOLETE -- See %assign/v0x)

This does a non-blocking assignment to a functor, similar to the
%assign instruction. The <var-label> identifies the base functor of
the affected variable, and the <delay> gives the delay when the
assignment takes place. The delay may be 0. The actual functor used is
calculated by using <var-label> as a base, and indexing with the
index[0] index register. This supports indexed assignment.

The <bit> is the address of the thread register that contains the bit
value to assign.


* %blend

This instruction blends the bits of two vectors into a result in a
manner line the expressions ('bx ? <a> : <b>). The two source vectors
are popped from the vec4 stack (and must have the same width) and the
result poshed in their place. The truth table for each bit is:

	1  1 --> 1
	0  0 --> 0
	z  z --> z
	x  x --> x
	.... --> x

In other words, if the bits are identical, then take that
value. Otherwise, the value is x.

* %blend/wr

This instruction blends real values for the ternary operator. If the
values match return that otherwise return 0.0. Two values are popped
from the stack, one is pushed back.

* %breakpoint

This instruction unconditionally breaks the simulator into the
interactive debugger. The idea is to stop the simulator here and give
the user a chance to display the state of the simulation using
debugger commands.

This may not work on all platforms. If run-time debugging is compiled
out, then this function is a no-op.

* %cassign/vec4 <var-label>
* %cassign/vec4/off <var-label>, <off-index>

Perform a continuous assign of a constant value to the target
variable. This is similar to %set, but it uses the cassign port
(port-1) of the signal functor instead of the normal assign, so the
signal responds differently. See "VARIABLE STATEMENTS" in the
README.txt file.

* %cassign/wr <var-label>

Perform a continuous assign of a constant real value to the target
variable. See %cassign/v above. The value is popped from the real
value stack.

* %cast2 <dst>, <src>, <wid>

Convert the source vector, of type logic, to a bool vector by
changing all the X and Z bits to 0. The source and destinations may
overlap.

* %cmp/u <bit-l>, <bit-r>, <wid> (XXXX Old meaning)
* %cmp/s <bit-l>, <bit-r>, <wid> (XXXX Old meaning)

These instructions perform a generic comparison of two vectors of equal
size. The <bit-l> and <bit-r> numbers address the least-significant
bit of each vector, and <wid> is the width. If either operand is 0,
1, 2 or 3 then it is taken to be a constant replicated to the selected
width.

The results of the comparison go into bits 4, 5, 6 and 7:

	4: eq  (equal)
	5: lt  (less than)
	6: eeq (case equal)

The eeq bit is set to 1 if all the bits in the vectors are exactly the
same, or 0 otherwise. The eq bit is true if the values are logically
the same. That is, x and z are considered equal. In other words the eq
bit is the same as ``=='' and the eeq bit ``===''.

The lt bit is 1 if the left vector is less than the right vector, or 0
if greater than or equal to the right vector. It is the equivalent of
the Verilog < operator. Combinations of these three bits can be used
to implement all the Verilog comparison operators.

The %cmp/u and %cmp/s differ only in the handling of the lt bit. The
%cmp/u does an unsigned compare, whereas the %cmp/s does a signed
compare. In either case, if either operand contains x or z, then lt
bit gets the x value.

* %cmp/s
* %cmp/u

These instructions perform a generic comparison of two vectors of
equal size. Two values are pulled from the top of the stack, and not
replaced. The results are written into flag bits 4,5,6. The
expressions (a<b), (a==b) and (a===b) are calculated, with (b) popped
from the stack first, then (a).

The results of the comparison go into flags 4, 5, 6 and 7:

	4: eq  (equal)
	5: lt  (less than)
	6: eeq (case equal)

* %cmpi/s <bit-l>, <immr>, <wid>
* %cmpi/u <bit-l>, <immr>, <wid>

These instructions are similar to the %cmp instructions above, except
that the right hand operand is an immediate value. This is a positive
number that the vector is compared with.

* %cmp/wr

Compare real values for equality and less-then. This opcode pops to
values from the real-value stack and writes the comparison result to
bits 4/5. The expressions (a < b) and (a==b) are calculated, with (b)
popped from the stack first, then (a).

* %cmp/ws <bit-l>, <bit-r>
* %cmp/wu <bit-l>, <bit-r>

[compare signed/unsigned integer words.]

* %cmp/z
* %cmp/x

These instructions are for implementing the casez and casex
comparisons. These work similar to the %cmp/u instructions, except
only an eq bit is calculated. These comparisons both treat z values in
the left or right operand as don't care positions. The %cmp/x
instruction will also treat x values in either operand as don't care.

Only bit 4 is set by these instructions.

* %cmp/str

This instruction pops the top two strings from the string stack and
compares them. The results of the comparison go into bits 4 and 5:

	4: eq  (equal)
	5: lt  (less than)

For the purposes of calculating the lt bit, the top string is the
right operand and the string underneath is the left operand. This
instruction removes two strings from the stack.

* %concat/str
* %concati/str <string>

Pop the top string, and concatenate it to the new top string. Or think
of it as passing the tail, then the head, concatenating them, and
pushing the result. The stack starts with two strings in the stack,
and ends with one string in the stack.

* %concat/vec4

Pop two vec4 vectors, concatenate them, and push the combined
result. The top of the vec4 stack is the LSB of the result, and the
next in this stack is the MSB bits of the result.

* %cvt/sr <bit-l>
* %cvt/rs <bit-l>

Copy a word from r to l, converting it from real to signed integer (sr)
or signed integer to real (rs) in the process. The source and destination
may be the same word address, leading to a convert in place. Precision
may be lost in the conversion.

The %cvt/sr <bit-l> gets the real value from the top of the real value
stack (and pops the value) and writes it to the indexed register.

* %cvt/ur <bit-l>
* %cvt/ru <bit-r>

Copy a word from r to l, converting it from real to unsigned integer (ur)
or signed integer to real (ru) in the process. The source and destination
may be the same word address, leading to a convert in place. Precision
may be lost in the conversion.

* %cvt/rv <bit-r>, <wid>
* %cvt/rv/s <bit-r>, <wid>

The %cvt/rv instruction converts a thread vector starting at <bit-r>
and with the width <wid> to a real word. Push the result onto the real
value stack. Precision may be lost in the conversion.

The %cvt/rv/s instruction is the same as %cvt/rv, but treats the thread
vector as a signed value.

* %cvt/vr <wid>

The %cvt/vr opcode converts a real word from the stack to a vec4 that
is <wid> wide. Non-integer precision is lost in the conversion, and
the real value is popped from the stack. The result is pushed to the
vec4 stack.

* %deassign <var-label>, <base>, <width>

Deactivate and disconnect a procedural continuous assignment to a
variable. The <var-label> identifies the affected variable.

The <base> and <width> are used to determine what part of the signal
will be deactivated. For a full deactivation the <base> is 0 and
<width> is the entire signal width.

* %deassign/wr <var-label>

The same as %deassign above except this is used for real variables.

* %debug/thr

These opcodes are aids for debugging the vvp engine. The vvp code
generator should not generate these, and they should not alter code
flow, data contents, etc.

* %delay <low>, <high>

This opcode pauses the thread, and causes it to be rescheduled for a
time in the future. The amount is the number of the ticks in the
future to reschedule, and is >= 0. If the %delay is zero, then the
thread yields the processor for another thread, but will be resumed in
the current time step.

The delay amount is given as 2 32bit numbers, so that 64bit times may
be represented.

* %delayx <idx>

This is similar to the %delay opcode, except that the parameter
selects an index register, which contains the actual delay. This
supports run-time calculated delays.

* %delete/obj <var-label>

Arrange for the dynamic object at the target label to be deleted.
This has no effect on the object or string stack. Note that this is
the same as:

   %null ;
   %store/obj <var-label>

but that idiom is expected to be common enough that it warrants an
optimized shorthand.

* %disable <scope-label>

This instruction terminates threads that are part of a specific
scope. The label identifies the scope in question, and the threads are
the threads that are currently within that scope.

* %disable/fork

This instruction terminates all the detached children for the current
thread. There should not be any non-detached children.


* %div <bit-l>, <bit-r>, <wid>
* %div/s <bit-l>, <bit-r>, <wid>

This instruction arithmetically divides the <bit-l> vector by the
<bit-r> vector, and leaves the result in the <bit-l> vector. IF any of
the bits in either vector are x or z, the entire result is x.

The %div/s instruction is the same as %div, but does signed division.


* %div/wr

This opcode divides the left operand by the right operand. If the
right operand is 0, then the result is NaN.


* dup/real
* dup/vec4

These opcodes duplicate the value on the top of the stack for the
corresponding type.

* %evctl <functor-label> <idx>
* %evctl/c
* %evctl/s <functor-label> <idx>
* %evctl/i <functor-label> <value>

These instructions are used to put event and repetition information
into the thread event control registers. These values are then used
by the %assign/e instructions to do not blocking event control. The
<functor-label> is the event to trigger on and the <idx> is an index
register to read the repetition count from (signed or unsigned).
%evctl/i sets the repetition to an immediate unsigned value.

%evctl/c clears the event control information. This is needed if a
%assign/e may be skipped since the %assign/e statements clear the
event control information and the other %evctl statements assert
that this information has been cleared. You can get an assert if
this information is not managed correctly.

* %event <functor-label>

This instruction is used to send a pulse to an event object. The
<functor-label> is an event variable. This instruction simply writes
an arbitrary value to the event to trigger the event.

* %file_line <file> <line> <description>

This command emits the provided file and line information along with
the description when it is executed. The output is sent to stderr and
the format of the output is:
   <file>:<line>: <description>
<file> is the unsigned numeric file index.
<line> is the unsigned line number.
<description> is a string, if string is 0 then the following default
message is used: "Procedural tracing.".

* %flag_set/imm <flag>, <value>

This instruction sets an immediate value into a flag bit. This is a
single bit, and the value is 0==0, 1==1, 2==z, 3==x.

* %flag_get/vec4 <flag>
* %flag_set/vec4 <flag>

These instructions provide a means for accessing flag bits. The
%flag_get/vec4 loads the numbered flag as a vec4 on top of the vec4
stack, and the %flag_set/vec4 pops the top of the vec4 stack and
writes the LSB to the selected flag.

* %force/v <label>, <bit>, <wid>

Force a constant value to the target variable. This is similar to %set
and %cassign/vec4, but it uses the force port (port-2) of the signal
functor instead of the normal assign port (port-0), so the signal
responds differently. See "VARIABLE STATEMENTS" and "NET STATEMENTS"
in the README.txt file.

* %force/wr <var-label>

Force a constant real value to the target variable. See %force/v
above. The value is popped from the real value stack.

* %force/x0 <label>, <bit>, <wid>

Force a constant value to part of the target variable. This is similar
to %set/x instruction, but it uses the force port (port-2) of the signal
functor instead of the normal assign port (port-0), so the signal
responds differently. See "VARIABLE STATEMENTS" and "NET STATEMENTS"
in the README.txt file.

* %fork <code-label>, <scope-label>

This instruction is similar to %jmp, except that it creates a new
thread to start executing at the specified address. The new thread is
created and pushed onto the child stack.  It is also marked runnable,
but is not necessarily started until the current thread yields.

The %fork instruction has no effect other than to push a child thread.

See also %join.

* %free <scope-label>

This instruction de-allocates the storage for a previously allocated
instance of as automatically allocated scope.


* %inv

Perform a bitwise invert of the vector on top of the vec4 stack. The result
replaces the input. Invert means the following, independently, for each
bit:

	0  --> 1
	1  --> 0
	x  --> x
	z  --> x


* %ix/vec4 <idx>
* %ix/vec4/s <idx>

This instruction loads a vec4 value from the vec4 stack, into the
index register <idx>. The value is popped from the vec4 stack and
written to the index register.

The %ix/vec4 instruction converts the 4-value bits into a binary
number, without sign extension. If any of the bits of the vector is x
or z, then the index register gets the value 0. The %ix/vec4/s
instruction is the same, except that it assumes the source vector is
sign extended to fit the index register.

The instruction also writes into bit 4 a 1 if any of the bits of the
input vector are x or z. This is a flag that the 0 value written into
the index register is really the result of calculating from unknown
bits.

	4: unknown value
	5: (reserved)
	6: (reserved)

* %ix/getv <idx>, <functor-label>
* %ix/getv/s <idx>, <functor-label>

These instructions are like the %ix/get instructions, except that they
read directly from a functor label instead of from thread bits. They
set bit 4 just like %ix/get.

* %ix/load <idx>, <low>, <high>

This instruction loads an immediate value into the addressed index
register. The index register holds 64 bit numeric values, so <low>
and <high> are used to separate the value in two 32 bit chunks.
The idx value selects the index register. This is different from
%ix/get, which loads the index register from a value in the thread bit
vector. The values are unsigned decimal values and are combined as
<high> << 32 | <low> to produce the final value.


* %ix/add <idx>, <low>, <high>
* %ix/sub <idx>, <low>, <high>
* %ix/mul <idx>, <low>, <high>

These instructions add, subtract, or multiply the selected index
register by the immediate value. The 64 bit immediate value is built
from the two 32 bit chunks <low> and <high> (see %ix/load above).
The <idx> value selects the index register.

* %ix/mov <dst>, <src>

This instruction simply sets the index register <dst> to the value of
the index register <src>.

* %jmp <code-label>

The %jmp instruction performs an unconditional branch to a given
location. The parameter is the label of the destination instruction.

* %jmp/[01xz] <code-label>, <flag>

This is a conditional version of the %jmp instruction. In this case,
a flag bit (addressed by <bit>) is tested. If it is one of the
values in the part after the /, the jump is taken. For example:

	%jmp/xz T_label, 8;

will jump to T_label if bit 8 is x or z.

* %join

This is the partner to %fork. This instruction causes the thread to
wait for the top thread in the child stack to terminate, then
continues. It has no effect in the current thread other than to wait
until the top child is cleared.

It is an error to execute %join if there are no children in the child
stack. Every %join in the thread must have a matching %fork that
spawned off a child thread.

If the matching child instruction is still running, a %join suspends
the calling thread until the child ends. If the child is already
ended, then the %join does not block or yield the thread.

* %join/detach <n>

This is also a partner to %fork. This instruction causes the thread
to detach <n> threads from the current thread. The <n> should be ALL
the children, and none of those children may be automatic. This
instruction is used to implement join_none and join_any from the
Verilog source.

* %load/av <bit>, <array-label>, <wid>

This instruction loads a word from the specified array. The word
address is in index register 3. Like %load/v below the width does
not have to match the width of the array word. See the %load/v
description for more information.

* %load/avp0 <bit>, <array-label>, <wid>
* %load/avp0/s <bit>, <array-label>, <wid>

This instruction is a mix of %load/av and %load/vp0. It loads an array
value like %load/av and then adds a value from index register 0 to the
result like %load/vp0. The loaded value is zero-extended to <wid>,
then added arithmetically to the signed index register 0. The result
is then stored in <bit>.

The %load/avp0/s instruction is the same, except that the loaded
vector is sign extended (instead of 0-extended) before the addition.

* %load/avx.p <bit>, <array-label>, <index>

This instruction is similar to %load/av, but it loads only a single
bit, and the <index> is the selector for the bit to use. If <index> is
out of range, then x is loaded. The index value is incremented by one
if it is defined (bit 4 is not 1).

* %load/dar <bit>, <functor-label>, <wid>
* %load/dar/r <functor-label>

This instruction loads an array word from a dynamic array. The
<label> refers to the variable object, and the <bit>/<wid> are the
location in local vector space where the extracted word goes. The
index is implicitly extracted from index register 3.

The dar/r variant reads a real-value into a real-valued register.

(See also %set/dar)

* %load/obj <var-label>

This instruction loads an object handle and pushes it to the top of
the object handle stack.

See also %store/obj.

* %load/real <vpi-label>

The %load/real instruction reads a real value from the vpi-like object
and pushes it to the top of the real value stack.

* %load/str <var-label>
* %load/stra <array-label>, <index>
* %load/dar/str <var-label>

The %load/str instruction gets the string from the string variable and
pushes in to the string stack. (See also %store/str)

The %load/dar/str is similar, but the variable is a dynamic array of
strings, and there is an index value in index register 3.
(See also %store/dar/str)


* %load/v <bit>, <functor-label>, <wid> (XXXX Old implementation)

This instruction loads a vector value from the given functor node into
the specified thread register bit. The functor-label can refer to a
.net, a .var or a .functor with a vector output. The entire vector,
from the least significant up to <wid> bits, is loaded starting at
thread bit <bit>. It is an OK for the width to not match the vector
width at the functor. If the <wid> is less than the width at the
functor, then the most significant bits are dropped. If the <wid> is
more than the width at the functor, the value is padded with X bits.

* %load/vec4 <var-label>

This instruction loads a vector value from the given functor node and
pushes it onto the vec4 stack. See also the %store/vec4 instruction.

* %load/vec4a <arr-label>, <addr-index>

This instruction loads a vec4 value from the array and pushes the
value onto the stack. The <addr-index> is the index register that
holds the canonical array index.

The load checks flag bit 4. If it is 1, then the load it cancelled and
replaced with a load of all X bits. See %ix/vec4.

* %load/vp0 <bit>, <functor-label>, <wid>
* %load/vp0/s <bit>, <functor-label>, <wid>

This instruction is the similar %load/v above, except that it also
adds the signed integer value in index register 0 into the loaded
value. The addition is a Verilog-style add, which means that if any of
the input bits are X or Z, the entire result is turned into a vector
of X bits.

The <wid> is, line the %load/v, the result width. But unlike the
%load/v, the vector is padded with 0s (%load/vp0) or sign extended
(%load/vp0/s) to the desired width.

* %load/ar <array-label>, <index>

The %load/ar instruction reads a real value from an array. The <index>
is the index register that contains the canonical word address into
the array.

* %load/x1p <bit>, <functor-label>, <wid>

This is an indexed load. It uses the contents of index register 1 to
select a part from a vector functor at <functor-label>. The
part is pulled from the indexed bit of the addressed functor and loaded
into the destination thread bit. The <wid> is the width of the
part. If any bit of the desired value is outside the vector, then that
bit is set to X.

The index register 1 is interpreted as a signed value. Even though the
address is canonical (from 0 to the width of the signal) the value in
index register 1 may be <0 or >=wid. The load instruction handles
filling in the out-of-bounds bits with x.

When the operation is done, the <wid> is added to index register 1, to
provide a basic auto-increment behavior.

* %loadi/wr <bit>, <mant>, <exp>

This opcode loads an immediate value, floating point, into the word
register selected by <bit>. The mantissa is an unsigned integer value,
up to 32 bits, that multiplied by 2**(<exp>-0x1000) to make a real
value. The sign bit is OR-ed into the <exp> value at bit 0x4000, and
is removed from the <exp> before calculating the real value.

If <exp>==0x3fff and <mant> == 0, the value is +inf.
If <exp>==0x7fff and <mant> == 0, the value is -inf.
If <exp>==0x3fff and <mant> != 0, the value is NaN.

* %max/wr
* %min/wr

This instruction pops the top two values from the real stack and
pushes back the max(min) value. Avoid returning NaN by selecting the
other if either is NaN.

* %mod
* %mod/s

This instruction calculates the modulus %r of the left operand, and
replaces the left operand with the result. The left and right vectors
are popped from the vec4 stack and have identical width. The result is
pushed onto the vec4 stack.

The /s form does signed %.

* %mod/wr

This opcode is the real-valued modulus of the two real values.

* %mov <dst>, <src>, <wid>
* %mov/wu <dst>, <src>
* %movi <dst>, <value>, <wid>

This instruction copies a vector from one place in register space to
another. The destination and source vectors are assumed to be the same
width and non-overlapping. The <dst> may not be 0-3, but if the <src>
is one of the 4 constant bits, the effect is to replicate the value
into the destination vector. This is useful for filling a vector.

The %movi variant moves a binary value, LSB first, into the
destination vector. The immediate value is up to 32bits, padded with
zeros to fill out the width.

* %mul

This instruction multiplies the left vector by the right vector, the
vectors pare popped from the vec4 stack and have the same width. If
any of the bits of either vector are x or z, the result is
x. Otherwise, the result is the arithmetic product. In any case, the
result is pushed back on the vec4 stack.


* %mul/wr

This opcode multiplies two real words together.


* %muli <bit-l>, <imm>, <wid>

This instruction is the same as %mul, but the second operand is an
immediate value that is padded to the width of the result.


* %nand

Perform the bitwise NAND of two vec4 vectors, and push the result. Each
bit is calculated independent of other bits. NAND means the following:

	0 and ? --> 1
	? and 0 --> 1
	1 and 1 --> 0
	otherwise   x


* %new/cobj <label>

Create a new class object. The <label> is the VPI label for a class
type definition.

* %new/darray <idx>, "<type>"

Create a new array (of int objects) with a size. the <idx> is the
address of an index variable that contains the computed array size to
use. The <type> is a string that expresses the type of the elements of
the array. See also %delete/obj

The supported types are:

         "b<N>"     - unsigned bool <N>-bits
         "sb<N>"    - signed bool <N>-bits
	 "r"        - real
	 "S"        - SystemVerilog string

* %nor

Perform the bitwise nor of vec4 vectors, and push the result. Eack bit
in the source vectors is combined to make a result bit according to the
truth table.

	1 nor ? --> 0
	? nor 1 --> 0
	0 nor 0 --> 1
	otherwise  x


* %nor/r <dst>, <src>, <wid> (XXXX Old definition)

The %nor/r instruction is a reduction nor. That is, the <src> is a
vector with width, but the result is a single bit. The <src> vector is
not affected by the operation unless the <dst> bit is within the
vector. The result is calculated before the <dst> bit is written, so
it is valid to place the <dst> within the <src>.

The actual operation performed is the inverted or of all the bits in
the vector.

* %nor/r

The %nor/r instruction is a reduction nor. That is, a vec4 value is
popped from the vec4 stack, the bits of the vector are or'ed together
to a signal bit, that bit is inverted and the resulting 1-bit vector
pushed back to the vec4 stack. See also the "%or" instruction.

* %null

Push a null object and push it to the object stack. The null object
can be used with any class or darray object, so it is not typed.

* %or

Perform the bitwise or of twp vectors. Pop two values from the vec4
stack to get the input arguments. Each bit in the result is combined
with the corresponding bit in the input arguments, according to the
truth table:

	1 or ? --> 1
	? or 1 --> 1
	0 or 0 --> 0
	otherwise  x

The results is then pushed onto the vec4 stack. The inputs and the
output are all the same width.

* %or/r <dst>, <src>, <wid>

This is a reduction version of the %or opcode. The <src> is a vector,
and the <dst> is a writable scalar. The <dst> gets the value of the
or of all the bits of the src vector.


* %pad <dst>, <src>, <wid> (XXXX Old version)

This instruction replicates a single bit in register space into a
destination vector in register space. The destination may overlap
the source bit. The <dst> may not be 0-3. This is useful for zero
or sign extending a vector.

* %pad/s <wid>
* %pad/u <wid>

These instruction change the size of the top item in the vec4
stack. If this item is larger then this, it is truncated. If smaller,
then extended. The /s variant sign extends, the /u variant unsigned
extends.

* %part <wid>

This instruction implements a part select. It pops from the top of the
vec4 the base value, then it pops the base to select from. The width
is the fixed number <wid>. The result is pushed back to the stack.

* %pop/str <num>
* %pop/real <num>
* %pop/obj <num>, <skip>
* %pop/vec4 <num>

Pop <num> items from the string/real/object/vec4 stack. This is the
opposite of the %pushX/str opcode which pushes a string to the
stack. The %pop/str is not normally needed because the %store/str
includes an implicit pop, but sometimes it is necessary to pop
explicitly.

The <skip> is the number of top positions on the stack to keep,
beforing starting to pop. This allows for popping positions other then
the top of the stack.

* %pow <bit-l>, <bit-r>, <wid>
* %pow/s <bit-l>, <bit-r>, <wid>

The %pow opcode raises <bit-l> (unsigned) to the power of <bit-r>
(unsigned) giving an exact integer result. The %pow/s opcode does
the same for signed values, except it uses the double pow() function
to calculate the result so may not produce exact results. The result
replaces the left operand.


* %pow/wr

This opcode raises the left operand by the right operand, and pushes
the result.

* %prop/v <pid>, <base>, <wid>
* %prop/obj <pid>
* %prop/r <pid>
* %prop/str <pid>

Read a vector (/v) or real value (/r) or string (/str) or object from
the property number <pid> of the class object on the top of the
object stack. Push the resulting value to the appropriate stack. The
class object that is the source is NOT popped from the object stack.

* %pushi/real <mant>, <exp>

This opcode loads an immediate value, floating point, into the real
value stack. The mantissa is an unsigned integer value, up to 32 bits,
that multiplied by 2**(<exp>-0x1000) to make a real value. The sign
bit is OR-ed into the <exp> value at bit 0x4000, and is removed from
the <exp> before calculating the real value.

If <exp>==0x3fff and <mant> == 0, the value is +inf.
If <exp>==0x7fff and <mant> == 0, the value is -inf.
If <exp>==0x3fff and <mant> != 0, the value is NaN.

* %pushi/str <text>

Push a literal string to the string stack.

* %pushi/vec4 <vala>, <valb>, <wid>

This opcode loads an immediate value, vector4, into the vector
stack. The <vala> is the boolean value bits, and the <valb> bits are
modifiers to support z and x values. The a/b encodings for the 4
possible logic values are:

   a b  val
   0 0   0
   1 0   1
   1 1   x
   0 1   z

This opcode is limited to 32bit numbers.

* %pushv/str <src>, <wid>

Convert a vector to a string and push the string to the string stack.

* %putc/str/v <functor-label>, <muxr>, <base>

Extract a vector byte from the thread vector space and write it to a
character of the string variable at <functtor-label>. This is
basically an implementation of <string>.putc(<muxr>, <val>) where
<val> is the 8bit vector at <base> in the thread space.

* %release/net <functor-label>, <base>, <width>
* %release/reg <functor-label>, <base>, <width>

Release the force on the signal that is represented by the functor
<functor-label>.  The force was previously activated with a %force/v
statement.  If no force was active on this functor the statement does
nothing. The %release/net sends to the labeled functor the release
command with net semantics: the unforced value is propagated to the
output of the signal after the release is complete. The %release/reg
sends the release command with reg semantics: the signal holds its
forced value until another value propagates through.

The <base> and <width> are used to determine what part of the signal
will be released. For a full release the <base> is 0 and <width> is
the entire signal width.

* %release/wr <functor-label>, <type>

Release the force on the real signal that is represented by the functor
<functor-label>.  The force was previously activated with a %force/wr
statement. The <type> is 0 for nets and 1 for registers. See the other
%release commands above.

* %replicate <count>

Pop the vec4 value, replicate it <count> times, then push the
result. In other words, push the concatenation of <count> copies.
See also the %concat instruction.

* %set/dar <var-label>, <bit>, <wid>
* %set/dar/obj <index>, <bit>, <wid>

The "%set/dar" opcode sets a vector to a word of the dynamic
array. Index register 3 contains the word address within the dynamic
array, and <bit>,<wid> specifies the thread vector to be written.

The "%set/dar/obj" opcode is similar, except that it sets elements of
a dynamic array that is in the top of the object stack. Instead of
using a fixed index register, use the register addressed by <index>.

* %set/dar/obj/real <index>
* %set/dar/obj/str <index>

The "%set/dar/obj/real" opcode sets the top value from the real-value
stack to the index. This does NOT pop the real value off the
stack. The intent is that this value may be written to a bunch of
values.

The "%set/dar/obj/str" opcode does the same but for string values and
uses the string stack.

* %set/v <var-label>, <bit>, <wid> (XXXX Old definition)

This sets a vector to a variable, and is used to implement blocking
assignments. The <var-label> identifies the variable to receive the
new value. Once the set completes, the value is immediately available
to be read out of the variable. The <bit> is the address of the thread
register that contains the LSB of the vector, and the <wid> is the
size of the vector. The width must exactly match the width of the
signal.

* %set/av <array-label>, <bit>, <wid> (XXXX Old definition)

This sets a thread vector to an array word. The <array-label>
addresses an array device, and the <bit>,<wid> describe a vector to be
written. Index register 3 contains the address of the word within the
array.

The base of a part select is retrieved from index register 1. The
width is implied from the <wid> that is the argument. This is the part
*within* the word.

The address (in canonical form) is precalculated and loaded into index
register 3. This is the address of the word within the array.

* %set/x0 <var-label>, <bit>, <wid>

This sets the part of a signal vector, the address calculated by
using the index register 0 to index the base within the vector of
<var-label>. The destination must be a signal of some sort. Otherwise,
the instruction will fail.

The addressing is canonical (0-based) so the compiler must figure out
non-zero offsets, if any. The width is the width of the part being
written. The other bits of the vector are not touched.

The index may be signed, and if less than 0, the beginning bits are
not assigned. Also, if the bits go beyond the end of the signal, those
bits are not written anywhere.


* %shiftl/i0 <bit>, <wid> (XXXX Old implementation)

This instruction shifts the vector left (towards more significant
bits) by the amount in index register 0. The <bit> is the address of
the LSB of the vector, and <wid> the width of the vector. The shift is
done in place. Zero values are shifted in.

For a negative shift the value is padded with 'bx.

* %shiftr/i0 <bit>, <wid> (XXXX Old implementation)
* %shiftr/s/i0 <bit>, <wid> (XXXX Old implementation)

This instruction shifts the vector right (towards the less significant
bits) by the amount in the index register 0. The <bit> is the address
of the LSB of the vector, and <wid> is the width of the vector. The
shift is done in place.

%shiftr/i0 is an unsigned down shift, so zeros are shifted into the
top bits. %shiftr/s/i0 is a signed shift, so the value is sign-extended.

For a negative shift %shiftr/i0 will pad the value with 'bx.

* %shiftl <idx>
* %shiftr <idx>
* %shiftr/s <idx>

These instructions shift the top value in the vec4 stack left (towards
MSB) or right, possibly signed. The <idx> is the address of the index
register that contains the amount to shift.

* %split/vec4 <wid>

Pull the top vec4 vector from the stack and split it into two
parts. Split off <wid> bits from the LSB, then push the remaining bits
of the original (the MSB) back to the stack. Then push the split off
LSB vector.

The <wid> must be less then the width of the original, unsplit vector.

* %store/obj <var-label>

This pops the top of the object stack and writes it to the object
variable given by the label.

See also %load/obj.

* %store/prop/obj <index>
* %store/prop/r <index>
* %store/prop/str <index>
* %store/prop/v <index>, <bit>, <wid>

The %store/prop/r pops a real value from the real stack and stores it
into the the property number <index> of a cobject in the top of the
object stack. The cobject is NOT popped.

The %store/prop/obj pops an object from the top of the object stack,
then writes it to the property number <index> of the cobject now on
top of the object stack. The cobject is NOT popped.

* %store/real <var-label>
* %store/reala <var-label>, <index>

This pops the top of the real variable stack and write it to the
object variable given by the label.

The reala version is similar, but writes to a real array using the
index in the index register <index>

* %store/str <var-label>
* %store/stra <array-label>, <index>
* %store/dar/r <var-label>
* %store/dar/str <var-label>

The %store/str instruction pops the top of the string stack and writes
it to the string variable.

The %store/stra targets an array.

The %store/dar/str is similar, but the target is a dynamic array of
string string. The index is taken from signed index register 3.

* %store/vec4 <var-label>, <offset>, <wid>
* %store/vec4a <var-label>, <addr>, <offset>

Store a logic vector into the variable. The value (and its width) is
popped off the top of the stack and written to the variable. The value
is then optionally truncated to <wid> bits and assigned to the
variable. It is an error for the value to be fewer then <wid>
bits. The <offset> is the index register that contains a part offset
for writing into a part of the variable.

The %store/vec4a is similar, but the target is an array of vec4, the
<addr> is an index register that contains the canonical address, and
the <offset> is an index register that contains the vector part
offset.

Both index registers can be 0, to mean a zero value instead of a zero
register.

NOTE: The <wid> is not necessary, and should be removed.

* %sub <bit-l>, <bit-r>, <wid> (XXXX Old version)

This instruction arithmetically subtracts the right vector out of the
left vector. It accomplishes this by adding to the left vector 1 plus
the 1s complement of the right vector. The carry value is dropped, and
the result, placed in <bit-l>, is the subtraction of <bit-r> from the
input <bit-l>. Both vectors have the same width. If any bits in either
operand are x, then the entire result is x.

See also the %add instruction.

* %sub

This instruction subtracts vec4 values. The right value is popped from
the vec4 stack, then the left value is popped. The right is subtracted
from the left, and the result pushed.

See also the %add instruction.

* %subi <bit-l>, <imm>, <wid>

This instruction arithmetically subtracts the immediate value from the
left vector. The <imm> value is a 16bit unsigned value zero-extended to
the <wid> of the left vector. The result replaces the left vector.

See also the %addi instruction.


* %sub/wr

This instruction operates on real values in word registers. The right
value is popped, the left value is popped, the right is subtracted
from the left, and the result pushed.

* %substr <start>, <end>

This instruction takes the substring of the top string in the string
stack. This implements the SystemVerilog style substring. The string
stack is popped and replaced with the result.

* %substr/v <bit-l>, <sel>, <wid>

This instruction extracts the substring of the top string in the string
stack and delivers the result to vector space. The <bit>,<wid> part is
the location where the result goes, and <sel> is the index register
that holds the index. This is the general method for getting string
values into the vector space. The string value is NOT popped.


* %test_nul <var-label>

This instruction tests the contents of the addressed variable to see
if it is null. If it is, set flag bit 4 to 1. Otherwise, set flag bit
4 to 0.

This is intended to implement the SystemVerilog expression
(<var>==null), where <var> is a class variable.

* %vpi_call <name> [, ...] {<vec4> <real> <str>}

This instruction makes a call to a system task that was declared using
VPI. The operands are compiled down to a vpiHandle for the call. The
instruction contains only the vpiHandle for the call. See the vpi.txt
file for more on system task/function calls.

The {...} part is stack information. This tells the run-time how many
stack items the call uses so that it knows how many to pop off the
stack when the call returns.

* %vpi_func <file> <line> <name> [, ...] {<vec4> <real> <str>}
* %vpi_func/r <file> <line> <name> [, ...] {<vec4> <real> <str>}

This instruction is similar to %vpi_call, except that it is for
calling system functions. The difference here is the return value from
the function call is pushed onto the appropriate stack. The normal
means that the VPI code uses to write the return value causes those
bits to go here.

The {...} part is stack information. This tells the run-time how many
stack items the call uses from each stack so that it knows how many to
pop off the stack when the call returns. The function call will pop
the real and string stacks, and will push any return value.


* %wait <functor-label>

When a thread executes this instruction, it places itself in the
sensitive list for the addressed functor. The functor holds all the
threads that await the functor. When the defined sort of event occurs
on the functor, a thread schedule event is created for all the threads
in its list and the list is cleared.

* %wait/fork

This instruction puts the current thread to sleep until all the detached
children have finished executing. The last detached child is responsible
for restarting the parent when it finishes.

* %xnor

This instruction pops two vectors from the vec4 stack, does a bitwise
exclusive nor (~^) of the vectors, and pushes the result. The truth
table for the xor is:

	0 xnor 0 --> 1
	0 xnor 1 --> 0
	1 xnor 0 --> 0
	1 xnor 1 --> 1
	otherwise    x


* %xor

This instruction pops two vectors from the vec4 stack, does a bitwise
exclusive or (^) of the vectors, and pushes the result. The truth
table for the xor is:

	0 xor 0 --> 0
	0 xor 1 --> 1
	1 xor 0 --> 1
	1 xor 1 --> 0
	otherwise   x


/*
 * Copyright (c) 2001-2009 Stephen Williams (steve@icarus.com)
 *
 *    This source code is free software; you can redistribute it
 *    and/or modify it in source code form under the terms of the GNU
 *    General Public License as published by the Free Software
 *    Foundation; either version 2 of the License, or (at your option)
 *    any later version.
 *
 *    This program is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *    GNU General Public License for more details.
 *
 *    You should have received a copy of the GNU General Public License
 *    along with this program; if not, write to the Free Software
 *    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 */