ADC support for GW5A-25 chips has been added.
The inputs of this primitive are fixed and do not require routing,
although they can be switched dynamically.
The .CST file also specifies the pins used as signal sources for the
bus0 and bus1 ADC buses.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In the GW5A series, the primitive SemiDual Port BSRAM cannot function
when the width of any of the ports is 32/36 bits - it is necessary to
divide one block into two identical ones, each of which will be
responsible for 16 bits.
Here, we perform such a division and, in addition, ensure that the new
cells resulting from the division undergo the same packing procedure as
the original ones.
Naturally, with some reservations (the AUX attribute is responsible for
this) - in the case of SP, when service elements are added, it makes
sense to do this immediately for 32-bit SP and only then divide.
Also, SDPs are currently being corrected for cases where both ports are
‘problematic’, but it may happen that one port is 32 and the other is,
say, 1/2/4/8/16. This has been left for the future.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In the new series of chips, the SemiDual Port primitive has one RESET
pin instead of two in previous versions - RESETA and RESETB.
Physically, the two pins are still there and both must be connected,
with RESETA being constant.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Over time, it became clear that the special status of corner tiles is
handled in other parts of the toolchain, and in the GW5A chip series, it
began to interfere—in this series, IO can be located in the corners.
So we move the only function (creating VCC and GND) to the extra
function itself, and at the same time create a mechanism for explicitly
specifying the location of these sources in Apicula when necessary.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The GW5A series is interesting—in this particular primitive, the inputs
have been renamed from CLKx to CLKINx. Everything else remains the same,
including functionality.
As an output, we will store in the chip database which prefix the DCS
inputs have.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
PLLA-type PLLs are implemented, which are used in GW5A-25A chips.
These are six powerful PLLs, each of which can generate seven
independent frequencies.
Since these devices have an unusual configuration—their fuse bits are
located outside the main grid and therefore their Bels do not have
specific “correct” coordinates—the extra bel functions mechanism is used
to describe them. But all the complexity falls on the apicula part.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Very rarely (about once a year), the dedicated clock router would
malfunction, issuing an incorrect route.
The reason turned out to be the so-called gate wires to the global clock
wire system from the logic. Among the PIPs for which these wires are
sinks, there are PIPs where the sources are also clock wires.
This leads to the possibility of feeding the clock signal back into the
gate and again into the global clock MUX.
If handled carelessly, this can lead to a complete loop.
But the loop option itself is particularly useful in the case of DCS
(dynamic clock selection) - the fact is that because these primitives
have four clock inputs and each of them could theoretically address all
56 clock sources, but in practice there are not enough wires and the DCS
inputs cannot serve as sinks for all clock sources.
The simplest solution (and the one that currently works) is to use the
gate to re-enter the clock system, but this time changing the clock
source.
This commit explicitly marks wires as gates and removes the possibility
of looping (however unlikely it may be) where a loop is not needed.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The ALUs in the GW5A series have undergone changes compared to previous
chips.
The most significant change is the appearance of an input MUX for
carry — it is now possible to switch between VCC, GND, and COUT of the
previous ALU, as well as generate carry in logic.
The granularity of resource allocation for ALUs has also changed — it is
now possible to use each half of a slice independently for ALUs.
Not all new features are reflected in this commit:
- since there is one CIN MUX for every six ALUs and it only works for
ALUs with index 0, the new granularity is not very useful: the head of
the chain can only be placed in the zero ALU. It is possible to gain one
LUT by allocating ALUs in odd numbers, but we will leave that for the
future.
- using CIN MUX to generate carry in logic is interesting, but we have
not yet been able to get the vendor IDE to generate such a
configuration to figure out which wires are used, so for now we are
leaving the old behavior in logic with the allocation of a specialized
head ALU.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The GW-5A series has 8 flip-flops in a cell instead of 6. These
additional flip-flops can be used if the control network matches that
for the 4th and 5th DFFs in this cell.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Prior to the 5A series, pin functions (GPIO/SSPI/JTAG/DONE/etc) were
switched using fuses. This was done during the binary image formation
stage for loading into the FPGA using the command line keys of the
gowin_pack program.
The 5A series features certain ports that connect to VCC or GND
depending on whether the pin is used as SSPI or GPIO, for example. This
mechanism exists in parallel with fuses, but it is not described
anywhere, nor is there a corresponding primitive.
To generate working images, we have no choice but to simulate this thing
at the nextpnr stage, since VCC/GND routing is required.
For now, two flags are added, responsible for the SSPI and I2C pin
functions.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Preparing to support the 5A series.
Family recognition is added, as well as minor fixes, but base generation
itself is not allowed for GW5 - this gives the ability to test the next
Apicula release and still not break installations for those who simply
specify `HIMBAECHEL_GOWIN_DEVICES = "all"`.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Recognize GW5A family chips.
Construct chip base name for
- GW5A-LV25MG121C1/l0 - TangPrimer 25k
- GW5AT-LV60PG484A - TangMega 60k
- GW5AST-LV138PG484A - TangMega 138k
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Fill in the delays for PIP classes related to HCLK and IODELAY. Also:
- if clock routing fails, we try to use the next fastest mechanism - segment networks;
- fixing harmless typos.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
DLLDLY is the clock delay primitive that adjust the input clock
according to the DLLSTEP signal and outputs the delayed clock.
These primitives are associated with clock pins and are "tapped" between
the output of this IBUF and the clock networks, leaving the possibility
to connect to the original unshifted signal as well, although the latter
is not very practical because it is no longer possible to use fast
wires.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Gowin chips have an interesting mechanism - wires that run vertically
through several rows (at least 10) in each column of the chip. In each
row a particular wire has branches to the left and right, covering on
average 4 neighboring cells in the row. For lack of a better term, I
further call such a wire a segment.
So a segment can provide a direct connection in a local rectangle. There
are no special restrictions on the sinks, so segment networks can be
used for ClockEnable, LocalSetReset, as well as for LUT and DFF inputs.
The sources are not so simple - the sources can be the upper or lower
end of the segment, which in theory can lead to unfortunate consequences
if the signal is applied from both ends.
The matter is complicated by the fact that there are default
connections, i.e. in the absence of any set fuse the segment input is
still connected to something (VCC for example) and to disable the unused
end of the segment you need to set a special combination of fuses.
Taking into account which end of which segment is used is one of the
tasks of this router. In addition, segment ends can physically coincide
with PLL, DSP and BSRAM inputs, which can also lead to unexpected
effects. Some of these things are tracked when generating the base, some
in this router, some when packing in gowin_pack.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Adds the ability to use high-speed clock lines (together with CLKDIV2
type frequency dividers operating on them) as sieve signals for the
CLKIN and CLKFB inputs of the rPLL and PLLVR primitives (these cover the
full range of supported Gowin chips).
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Add I3C io buffer.
A buffer is added that can operate as a normal IOBUF in PUSH-PULL mode
or switch to open-drain IOBUF mode.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Turn a variable into a set of flags
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Adds output (MIPI_OBUF and MIPI_OBUF_A) and input (MIPI_IBUF) primitives
to allow the use of “real” MIPI (not emulation) ports capable of
operating in both HS and LP modes.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Erroneously created wires for specific IOs on the underside of some
chips.
Fixes https://github.com/YosysHQ/nextpnr/issues/1417
Also cosmetic edits.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Add sampling part to IO blocks (input only). This edge detector will
allow to dynamically adjust DDR decoding window in the future.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. FFs placement.
* Allow clusters to be created from FFs and LUTs;
* Immediately create pass-through LUTs from free LUTs adjacent to FF - at the same time ensure alternating use of LUT inputs;
* In case of constant networks, such pass-through LUTs are disconnected from networks altogether;
* Allow FF to be placed directly into SSRAM slides - this is useful when using synchronous reading.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix aux name creation
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Use I3 for pass-trough LUTs
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Implement the EMCU primitive.
Add support for the GW1NSR-4C's embedded Cortex-M3 processor. Since it
uses flash in its own way, we disable additional flash processing for
this case.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix merge.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Add DHCEN primitive.
This primitive allows you to dynamically turn off and turn on the
networks of high-speed clocks.
This is done tracking the routes to the sinks and if the route passes
through a special HCLK MUX (this may be the input MUX or the output MUX,
as well as the interbank MUX), then the control signal of this MUX is
used.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Change the DHCEN binding
Use the entire PIP instead of a wire - avoids normalisation and may also
be useful in the future when calculating clock stuff.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Some Clocks PIPS were not created due to a check for the presence of a
delay class, now all wires are attributed to the class so that there is
no longer any need for this check.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Implement the UserFlash primitive
Some Gowin chips have embedded flash memory accessible from the fabric.
Here we add primitives that allow access to this memory.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix cell creation
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Himbaechel Gowin: Add support for CLKDIV and CLKDIV2
* Himbaechel Gowin: Add support for CLKDIV and CLKDIV2
* Gowin Himbaechel: HCLK Bug fixes and corrections
DQCE and DCS primitives are added.
DQCE allows the internal logic to enable or disable the clock network in
the quadrant. When clock network is disabled, all logic drivern by this
clock is no longer toggled, thus reducing the total power consumtion of
the device.
DCS allows you to select one of four sources for two clock wires (6 and 7).
Wires 6 and 7 have not been used up to this point.
Since "hardware" primitives operate strictly in their own quadrants,
user-specified primitives are converted into one or more "hardware"
primitives as needed.
Also:
- minor edits to make the most of helper functions like connectPorts()
- when creating bases, the corresponding constants are assigned to the
VCC and GND wires, but for now huge nodes are used because, for an
unknown reason, the constants mechanism makes large examples
inoperable. So for now we remain on the nodes.
Compatible with older Apicula databases.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
As the board on the GW1N-1 chip becomes a rarity, its replacement is the
Tangnano1k board with the GW1NZ-1 chip. This chip has a unique mechanism
for turning off power to important things such as OSC, PLL, etc.
Here we introduce a primitive that allows energy saving to be controlled
dynamically.
We also bring the names of some functions to uniformity.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In the images generated by Gowin IDE, the signals for dynamic BSRAM
block selection (BLKSEL[2:0]) are not always connected directly to the
ports - some chips add LUT2, LUT3 or LUT4 to turn these signals into
Clock Enable. Apparently there are chips with an error in the operation
of these ports.
Here we make such a decoder instead of using ports directly.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
It seems that the internal registers on the BSRAM output pins in
READ_MODE=1'b1 (pipeline) mode do not function properly because in the
images generated by Gowin IDE an external register is added to each pin,
and the BSRAM itself switches to READ_MODE=1'b0 (bypass) mode .
This is observed on Tangnano9k and Tangnano20k boards.
Here we repeat this fix.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Add description of BSRAM harness
In some cases, Gowin IDE adds a number of LUTs and DFFs to the BSRAM. Here we are trying to add similar elements.
More details with pictures: https://github.com/YosysHQ/apicula/blob/master/doc/bsram-fix.md
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
When multiplying 36 bits by 36 bits using four 18x18 multipliers, the
sign bits of the higher 18-bit parts of the multipliers were correctly
switched, but what was incorrect was leaving the sign bits of the lower
parts of the multipliers uninitialized. They now connect to VSS.
Addresses https://github.com/YosysHQ/apicula/issues/242
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
If the CLKIN input of the PLL is connected to a special pin, then it
makes sense to try to place the PLL so that it uses a direct implicit
non-switched connection to this pin.
The transfer of information about pins for various purposes has been
implemented (clock input signal, feedback, etc), but so far only CLKIN
is used.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
For the following primitives:
- PADD9
- PADD18
- MULT9X9
- MULT18X18
- MULT36X36
- MULTALU18X18
- MULTALU36X18
- MULTADDALU18X18
- ALU54D
packing and processing of fixed wires between macro and between DSP
blocks is implemented.
Clusters of DSP and macro blocks are processed using custom placement of
cluster elements.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Corrects the situation when it is impossible to use IOBUF with two
IOLOGIC elements at the same time - input and output.
Addresses https://github.com/YosysHQ/nextpnr/issues/1275
This is done by dividing one IOLOGIC Bel into two - input IOLOGIC and
output IOLOGIC plus checking for compatibility of the cells located
there.
At the moment, this check is simple and allows only the combination of
DDR and DDRC primitives.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The following primitives are implemented for the GW1NZ-1 chip:
* pROM - read only memory - (bitwidth: 1, 2, 4, 8, 16, 32).
* pROMX9 - read only memory - (bitwidth: 9, 18, 36).
* SDPB - semidual port - (bitwidth: 1, 2, 4, 8, 16, 32).
* SDPX9B - semidual port - (bitwidth: 9, 18, 36).
* DPB - dual port - (bitwidth: 16).
* DPX9B - dual port - (bitwidth: 18).
* SP - single port - (bitwidth: 1, 2, 4, 8, 16, 32).
* SPX9 - single port - (bitwidth: 9, 18, 36).
Also:
- The creation of databases for GW1NS-2 has been removed - this was not
planned to be supported in Himbaechel from the very beginning and
even examples were not created in apicula for this chip due to the
lack of boards with it on sale.
- It is temporarily prohibited to connect DFFs and LUTs into clusters
because for some reason this prevents the creation of images on lower
chips (placer cannot find the placement), although without these
clusters the images are quite working. Requires further research.
- Added creation of ALU with mode 0 - addition. Such an element is not
generated by Yosys, but it is a favorite vendor element and its
support here greatly simplifies the compilation of vendor netlists.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Now the clock router can place a buffer into the specified network,
which divides the network into two parts: from the source to the buffer,
routing occurs through any available PIPs, and after the buffer to the
sink, only through a dedicated global clock network.
This is made specifically for the Tangnano20k where the external
oscillator is soldered to a regular non-clock pin. But it can be used
for other purposes, you just need to remember that the recipient must be
a CLK input or output pin.
The port/network to set the buffer to is specified in the .CST file:
CLOCK_LOC "name" BUFG;
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In these chips, the midline IOs are still simple, but are no longer just
IOBUF - that is, unlike the GW1N-1 IBUF and OBUF are not obtained by
applying a signal to the OEN input.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
And also fix the clock router to allow (with a warning) non-dedicated
routing in case of false detection of clock wires.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Information about what function (main or auxiliary) the cell performs in
these primitives is transmitted through the tile's extra data. And this
also allows us to remove the calculation of the coordinates of the
auxiliary cell on the go.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
A single mechanism for creating a new type of tile if special functions
are found in the chip database that depend on the coordinates of the
tile.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>