Add CLKDIV — a frequency divider with ratios of 1, 2, 3, 3.5, 4,
5, 6, 7, and 8.
A direct, non-switchable connection to CLKDIV2 makes placement more
difficult — we have to account for CLKDIV2’s occupancy for IOLOGIC and,
if necessary, duplicate the cell, as well as create clusters of CLKDIV
and CLKDIV2.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Implement GW5A HCLK and CLKDIV2.
HCLK pins have been added for the GW5A series, and the placement of
CLKDIV2 primitives has been updated to account for the specific
characteristics of this chip series.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix style.
* Gowin. Fix style.
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* GOWIN. BUGFIX. BSRAM port renaming.
The renumbering of the BSRAM pins has been corrected.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* GOWIN. Comment BSRAM port renaming
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* gowin: add DL-series latch cell support
Teach the himbaechel Gowin backend to recognize and place all 12
DL-series latch primitives onto DFF BEL sites. Latches use the CLK
pin for the gate signal and share placement resources with DFFs.
* gowin: convert latches to DFFs with LATCH attribute during packing
Instead of teaching all DFF infrastructure about 12 DL latch types,
pack_latches() converts them to corresponding DFF types early and sets
a LATCH attribute. This attribute is picked up by gowin_pack to set
REGMODE=LATCH instead of FF.
* gowin: exclude latch gate signals from clock buffer promotion
Latch cells are mapped to DFFs with a LATCH attribute, so their gate
signal drives the CLK port. This caused pack_buffered_nets to promote
the gate signal onto a global clock buffer (BUFG), which has different
timing/initialization behavior and caused the first gate transition
to be lost. Skip CLK pins on cells with the LATCH attribute when
checking for clock users.
* gowin: update latch message to be user friendly.
Dual Port has a defective output register. This only manifests itself at
small data widths and only on -C chips.
That is, Tangprimer20k (GW2A-18) works perfectly, while Tangnano20k
(GW2A-18C) stutters. The same story with GW1N-9 and GW1N-9C.
Fortunately, the fix has long been included in nextpnr for SDP memory,
so all that remains is to call the same function for Dual Port.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
We are fixing a hardware error - in BYPASS mode, dual port bsram
requires synchronization of CE and OCE signals for some data widths.
We are also getting rid of port renaming in the loop, but not all of
them yet.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The new multiplier is made from two 27x18 units by switching inputs and
creating a cluster connected via CASO->CASI.
A second pass was required to process the multipliers created on the
fly—the processing of DSP cells was separated into a separate function,
which resulted in a large diff, but in reality there were very few
changes.
An important point is that in the 5A series, there is a gap between
adjacent DSPs in one row. There are still SIA/CASI wires, so the DSPs on
either side of the gap are connected, but the distance between them is
greater than usual. We take this fact into account based on the gap
coordinates from the chip database.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
This primitive occupies one DSP block entirely and can be connected into
complex chains both by arguments (shifting operands from SOA to SIA) and
by results (CASO->CASI cascades).
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. DSP. Implement MULT12x12.
The 5A series DSP differs from previous ones. Many things have been
greatly simplified: there are only two control signals of one type per
cell (2 CLK, 2 CE and 2 RESET), and these signals are now explicitly
specified in the DSP attributes, which makes the automatic assignment
mechanism unnecessary for them.
The DSP occupies 3 cells instead of nine due to the exclusion of 4
low-bit multipliers - now there are only two 12x12. There will naturally
be clusters, but they will be simpler and consist of other primitives.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Implement MULTADDALU12X12.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* gowin: Update arch gen to use msgspec chipdb format
Apycula now uses msgspec MessagePack serialization instead of pickle
for the chipdb files. This change:
- Replaces pickle with msgspec via load_chipdb()
- Changes file extension from .pickle to .msgpack.gz
- Updates grid access patterns for new Device structure where
db.grid[y][x] returns ttyp (int) directly, use db[y, x] for Tile
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Update chipdb extension to .msgpack.xz
Apicula switched from gzip to lzma compression for chipdb files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Only one bit per macro is responsible for the bit width of operands. We
add operand width tracking and do not allow different operands to be
combined in a single macro.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Add GW5AST-138C chip.
The ability to perform P&R for the largest GW5A series chip currently
available has been added, which has its own characteristics:
- the need to invert pin function configuration signals - these
signals are not part of the design, but are nextpnr command line
keys for specifying the activation of alternative pin functions such as
I2C;
- some clock PIPs are encoded not by fuses, but by applying VCC/GND to
special inputs. This is also not part of the design and is not a
dynamic clock selection primitive - it is simply an addition to the
fuses.
- added check for DFF and SSRAM placement in upper slots - prior to
this chip, SSRAM was not supported and there was no need for this
check.
- since the chip is divided into two parts in terms of the global
clock network, a flag is introduced to indicate which part the wire
belongs to. This is only requested for clock wires.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix style.
Use C++ type cast.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. BUGFIX. BSRAM SP separation.
The new SP cell must inherit the byte size - 8 or 9 bits.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Byte Enables processing in SP.
Single Port with a data width of 32/36 is internally configured as Dual
Port with 16/18. Even and odd words are processed separately by ports A
and B.
With the advent of byte enable support, it became necessary to switch
these signals differently.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The TLVDS_IBUF_ADC IO primitives have been implemented, which provide a
signal for ADC bus 2. These differential IO primitives also have an
additional input that allows them to be disabled, thereby providing
dynamic switching of the signal source for the ADC.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
ADC support for GW5A-25 chips has been added.
The inputs of this primitive are fixed and do not require routing,
although they can be switched dynamically.
The .CST file also specifies the pins used as signal sources for the
bus0 and bus1 ADC buses.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Since ctx->getArchArgs() no longer returns architecture-specific
arguments, we read the args field directly.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In the GW5A series, the primitive SemiDual Port BSRAM cannot function
when the width of any of the ports is 32/36 bits - it is necessary to
divide one block into two identical ones, each of which will be
responsible for 16 bits.
Here, we perform such a division and, in addition, ensure that the new
cells resulting from the division undergo the same packing procedure as
the original ones.
Naturally, with some reservations (the AUX attribute is responsible for
this) - in the case of SP, when service elements are added, it makes
sense to do this immediately for 32-bit SP and only then divide.
Also, SDPs are currently being corrected for cases where both ports are
‘problematic’, but it may happen that one port is 32 and the other is,
say, 1/2/4/8/16. This has been left for the future.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In the new series of chips, the SemiDual Port primitive has one RESET
pin instead of two in previous versions - RESETA and RESETB.
Physically, the two pins are still there and both must be connected,
with RESETA being constant.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Over time, it became clear that the special status of corner tiles is
handled in other parts of the toolchain, and in the GW5A chip series, it
began to interfere—in this series, IO can be located in the corners.
So we move the only function (creating VCC and GND) to the extra
function itself, and at the same time create a mechanism for explicitly
specifying the location of these sources in Apicula when necessary.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The GW5A series is interesting—in this particular primitive, the inputs
have been renamed from CLKx to CLKINx. Everything else remains the same,
including functionality.
As an output, we will store in the chip database which prefix the DCS
inputs have.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
PLLA-type PLLs are implemented, which are used in GW5A-25A chips.
These are six powerful PLLs, each of which can generate seven
independent frequencies.
Since these devices have an unusual configuration—their fuse bits are
located outside the main grid and therefore their Bels do not have
specific “correct” coordinates—the extra bel functions mechanism is used
to describe them. But all the complexity falls on the apicula part.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
By replacing the operation of adding the input to itself with a
specially formed LUT, we free up two PIPs.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Very rarely (about once a year), the dedicated clock router would
malfunction, issuing an incorrect route.
The reason turned out to be the so-called gate wires to the global clock
wire system from the logic. Among the PIPs for which these wires are
sinks, there are PIPs where the sources are also clock wires.
This leads to the possibility of feeding the clock signal back into the
gate and again into the global clock MUX.
If handled carelessly, this can lead to a complete loop.
But the loop option itself is particularly useful in the case of DCS
(dynamic clock selection) - the fact is that because these primitives
have four clock inputs and each of them could theoretically address all
56 clock sources, but in practice there are not enough wires and the DCS
inputs cannot serve as sinks for all clock sources.
The simplest solution (and the one that currently works) is to use the
gate to re-enter the clock system, but this time changing the clock
source.
This commit explicitly marks wires as gates and removes the possibility
of looping (however unlikely it may be) where a loop is not needed.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
A programmable on-chip crystal oscillator has been implemented for the
GW5A series.
A critical innovation in this series was the change in the nature of the
OSC output pin—it now belongs to the clock wires, and therefore the
routes must be made with a special global router, as there is no
possibility of using routing through general-purpose PIPs.
At the same time, we are transferring the outputs of all previous
generations of OSC to potential clock wires. At the moment, this will
not affect the way they are routed - they will still end up as segments
as before, but in the future we may optimize the mechanism.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Optimize ALU wiring
Interestingly, although VCC and GND sources are present in each cell,
they cannot be connected directly to all LUT inputs. Instead, additional
PIPs are used.
A very simple ALU optimization: once we detect that one of the inputs is
a constant, we modify the main LUT that describes the ALU function so
that this primitive input is ignored, and then disconnect it from the
network, freeing up the PIP.
For example (unrealistic, since a real ALU LUT has a larger size and
service bits in the middle, etc.), the addition function of A and B when
A = 1 is converted from the general case (A isn't a constant and B isn't a
constant) to a special case:
0110 -> 0011
The renaming of ALU ports for ADD and SUB modes has also been
removed—this has already been done in the chip database as a fixed
change to the ALU LUT.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix the style.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The LUTRAM mode is added to all supported chips at once.
This is essentially an alias for LUT4, so the packaging is also moved
before searching for LUT-DFF pairs for possible optimization.
In addition to being the only LUTRAM mode in the GW5A series, the
addition of ROM16 eliminates the need to manually rename the primitive
and its pins when working with files generated by Gowin IDE - a similar
situation occurred with INV, which is essentially LUT1.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The ALUs in the GW5A series have undergone changes compared to previous
chips.
The most significant change is the appearance of an input MUX for
carry — it is now possible to switch between VCC, GND, and COUT of the
previous ALU, as well as generate carry in logic.
The granularity of resource allocation for ALUs has also changed — it is
now possible to use each half of a slice independently for ALUs.
Not all new features are reflected in this commit:
- since there is one CIN MUX for every six ALUs and it only works for
ALUs with index 0, the new granularity is not very useful: the head of
the chain can only be placed in the zero ALU. It is possible to gain one
LUT by allocating ALUs in odd numbers, but we will leave that for the
future.
- using CIN MUX to generate carry in logic is interesting, but we have
not yet been able to get the vendor IDE to generate such a
configuration to figure out which wires are used, so for now we are
leaving the old behavior in logic with the allocation of a specialized
head ALU.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
With the release of Apicula 0.22, the GW5A series gained support for
simple IO, LUTs (including Widw LUTs), and DFFs (including flip-flops 6
and 7 specific to the GW5A series), so we can include the GW5A-25A among
Gowin devices.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The GW-5A series has 8 flip-flops in a cell instead of 6. These
additional flip-flops can be used if the control network matches that
for the 4th and 5th DFFs in this cell.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Prior to the 5A series, pin functions (GPIO/SSPI/JTAG/DONE/etc) were
switched using fuses. This was done during the binary image formation
stage for loading into the FPGA using the command line keys of the
gowin_pack program.
The 5A series features certain ports that connect to VCC or GND
depending on whether the pin is used as SSPI or GPIO, for example. This
mechanism exists in parallel with fuses, but it is not described
anywhere, nor is there a corresponding primitive.
To generate working images, we have no choice but to simulate this thing
at the nextpnr stage, since VCC/GND routing is required.
For now, two flags are added, responsible for the SSPI and I2C pin
functions.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Preparing to support the 5A series.
Family recognition is added, as well as minor fixes, but base generation
itself is not allowed for GW5 - this gives the ability to test the next
Apicula release and still not break installations for those who simply
specify `HIMBAECHEL_GOWIN_DEVICES = "all"`.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Recognize GW5A family chips.
Construct chip base name for
- GW5A-LV25MG121C1/l0 - TangPrimer 25k
- GW5AT-LV60PG484A - TangMega 60k
- GW5AST-LV138PG484A - TangMega 138k
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Adds automatic connection of a general-purpose pin to the global clock
network.
The old behaviour, where such networks have to be explicitly specified,
can be activated with the command line key
"--vopt disable_gp_clock_routing".
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Use loop enumeration of PIPs instead of direct name construction for the
upper and lower ends of the segment wire.
Also do not allow clock wires for segments.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>