add docs and add trigger config for logic analyzer

This commit is contained in:
Fischer Moseley 2023-04-17 17:57:26 -04:00
parent 3400ea63c8
commit 870d299c74
9 changed files with 312 additions and 48 deletions

113
doc/block_memory_core.md Normal file
View File

@ -0,0 +1,113 @@
## Usage
### Configuration
The block memory core can be included in your configuration just like any other core. Only the `width` and `depth` parameters are needed:
```yaml
---
cores:
my_block_memory:
type: block_memory
width: 12 # (1)
depth: 16384
```
1. If your BRAM is more than 16 bits wide, check out the section on [Synchronicity](#synchronicity) and make sure Manta's behavior is compatible with your project.
### Verilog
Internally this creates a dual-port BRAM, connects one end to Manta's internal bus, and exposes the other to the user:
```systemverilog
manta manta_inst (
.clk(clk),
.rx(rx),
.tx(tx),
.my_block_memory_clk(),
.my_block_memory_addr(),
.my_block_memory_din(),
.my_block_memory_dout(),
.my_block_memory_we());
```
Which are free to connect to whatever logic the user desires.
### Python
Just like with the other cores, interfacing with the BRAM with the Python API is simple:
```python
from manta import Manta
m = manta('manta.yaml')
m.my_block_memory.write(addr=38, data=600)
m.my_block_memory.write(addr=0x1234, data = 0b100011101011)
m.my_block_memory.write(0x0612, 0x2001)
foo = m.my_block_memory.write(addr=38)
foo = m.my_block_memory.write(addr=1234)
foo = m.my_block_memory.write(0x0612)
```
Reading/writing in batches is also supported. This is recommended where possible, as reads are massively sped up by performing them in bulk:
```python
addrs = list(range(0, 1234))
datas = list(range(1234, 2468))
m.my_block_memory.write(addrs, datas)
foo = m.my_block_memory.read(addrs)
```
### Examples
A Block Memory core is used in the [video_sprite](https://github.com/fischermoseley/manta/blob/main/examples/nexys_a7/video_sprite) example. This uses the core to store a 128x128 image sprite in 12-bit color, and outputs it to a VGA display at 1024x768. The sprite contents can be filled with an arbitrary image using the [send_image.py](https://github.com/fischermoseley/manta/blob/main/examples/nexys_a7/video_sprite/send_image.py) python script.
## Under the Hood
Each Block Memory core is actually a set of 16-bit wide BRAMs with their ports concatenated together, with any spare bits masked off. Here's a diagram:
This has one major consequence: if the core doesn't have a width that's an exact multiple of 16, Vivado will throw some warnings during synthesis as it optimizes out the unused bits. This is expected behavior (and rather convenient, actually).
The warnings are a little annoying, but not having to manually deal with the unused bits simplifies the implementation immensely - no Python is needed to generate the core, and it'll configure itself just based on Verilog parameters. This turns the block memory core from complicated beast requring a bunch of conditional instantiation in Python to a simple ~_100 line_ [Verilog file](https://github.com/fischermoseley/manta/blob/main/src/manta/block_memory.v).
### Address Assignment
Since each $n$-bit wide block memory is actually $ceil(n/16)$ BRAMs under the hood, addressing the BRAMs correctly from the bus is important. BRAMs are organized such that the 16-bit words that make up each entry in the Block Memory core are next to each other in bus address space. For instance, if one was to configure a core of width 34, then the memory map would be:
```
bus address : | bram address
BUS_BASE_ADDR + 0 : address 0, bits [0:15]
BUS_BASE_ADDR + 1 : address 0, bits [16:31]
BUS_BASE_ADDR + 2 : address 0, bits [32:33]
BUS_BASE_ADDR + 3 : address 1, bits [0:15]
BUS_BASE_ADDR + 4 : address 1, bits [16:31]
...
```
corresponding to each
### Synchronicity
Since Manta's [data bus](../system_architecture) is only 16-bits wide, it's only possible to manipulate the BRAM core in 16-bit increments. This means that if you have a BRAM that's ≤16 bits wide, you'll only need to issue a single bus transaction to read/write one entry in the BRAM. However, if you have a BRAM that's ≥16 bits wide, you'll need to issue a bus transaction to update each 16-bit slice of it. For instance, updating a single entry in a 33-bit wide BRAM would require sending 3 messages to the FPGA: one for bits 1-16, another for bits 17-32, and one for bit 33. If your application expects each BRAM entry to update instantaneously, this could be problematic. Here's some exapmles:
!!! warning "Choice of interface matters here!"
The interface you use (and to a lesser extent, your operating system) will determine the space between bus transactions. For instance, 100Mbit Ethernet is a thousand times faster than 115200bps UART, so issuing three bus transactions will take a thousanth of the time.
### Example 1 - ARP Caching
For instance, if you're making a network interface and you'd like to peek at your ARP cache that lives in a BRAM, it'll take three bus transactions to read each 48-bit MAC address. This will take time, during which your BRAM cache could update, leaving you with 16-bit slices that correspond to different states of the cache.
In a situation like this, you might want to pause writes to your BRAM while you dump its contents over serial. Implementing a flag to signal when a read operation is underway is simple - adding an [IO core](../io_core) to your Manta instance would accomplish this. You'd assert the flag in Python which disables writes to the user port on the FPGA, perform your reads, and then deassert the flag.
### Example 2 - Neural Network Accelerator
This problem would also arise if you were making a NN accelerator, with 32-bit weights stored in a BRAM updated by the host machine. Each entry would need two write operations, and during the time between the first and second write, the entry would contain a MSB from one weight, and a LSB from another. This may not be desirable - depending on what you do with your inference results, running the network with the invalid weight might be problematic.
If you can pause inference, then the flag-based solution with an IO core described in the prior example could work. However if you cannot pause inference, you could use a second BRAM as a cache. Run inference off one BRAM, and write new weights into another. Once all the weights have been written, assert a flag with an IO Core, and switch the BRAM that weights are obtained from. This guaruntees that the BRAM contents are always valid.

View File

@ -1,5 +1,7 @@
## Getting Started
Manta is installed with `pip3 install mantaray`. Or at least it will be, once it's out of alpha. For now, it's installable with `pip install -i https://test.pypi.org/simple/ mantaray`, which just pulls from the PyPI testing registry.
Manta is installed with `pip3 install mantaray`.
Or at least it will be, once it's out of alpha. For now, it's installable with `pip install -i https://test.pypi.org/simple/ mantaray`, which just pulls from the PyPI testing registry.
## Examples
Examples can be found under `examples/`. These target the Xilinx Series 7 FPGAs on the [Nexys A7](https://digilent.com/reference/programmable-logic/nexys-a7/start)/[Nexys4 DDR](https://digilent.com/reference/programmable-logic/nexys-4-ddr/start) and the Lattice iCE40 on the [Icestick](https://www.latticesemi.com/icestick).

View File

@ -0,0 +1,16 @@
window.MathJax = {
tex: {
inlineMath: [["\\(", "\\)"]],
displayMath: [["\\[", "\\]"]],
processEscapes: true,
processEnvironments: true
},
options: {
ignoreHtmlClass: ".*|",
processHtmlClass: "arithmatex"
}
};
document$.subscribe(() => {
MathJax.typesetPromise()
})

View File

@ -1,65 +1,92 @@
# Logic Analyzer
This emulates the look and feel of a logic analyzer, both benchtop and integrated. These work by continuously sampling a set of digital signals, and then when some condition (the _trigger_) is met, recording these signals to memory, which are then read out to the user.
This emulates the look and feel of a logic analyzer, both benchtop and integrated. These work by continuously sampling a set of digital signals, and then when some condition (the _trigger_) is met, recording these signals to memory, which are then read out to the user.
Manta works exactly the same way, and the behavior of the logic analyzer is defined entirely in the Manta configuration file. Here's an example:
## Configuration
```yaml
---
logic_analyzer:
sample_depth: 4096
clock_freq: 100000000
cores:
my_logic_analyzer:
type: logic_analyzer
sample_depth: 4096
probes:
larry: 1
curly: 1
moe: 1
shemp: 4
triggers:
- larry && curly && ~moe
probes:
larry: 1
curly: 1
moe: 1
shemp: 4
uart:
baudrate: 115200
port: "/dev/tty.usbserial-2102926963071"
data: 8
parity: none
stop: 1
timeout: 1
triggers:
- moe RISING
- curly FALLING
```
There's a few parameters that get configured here, including:
## Probes
### Sample Depth
Probes are the signals read by the core. These are meant to be connected to your RTL design when you instantiate your generated copy of Manta. These can be given whatever name and width you like (within reason). You can have up to ~32k probes in your design.
Which is just how many samples are saved in the capture. Having a larger sample depth will use more resources on the FPGA, but show what your probes are doing over a longer time.
## Sample Depth
### Probes
Sample depth controls how many samples of the probes get read into the buffer.
Probes are the signals you're trying to observe with the Logic Analyzer core. Whatever probes you specify in the configuration will be exposed by the `manta` module, which you then connect to your design in Verilog. Each probe has a name and a width, which is the number of bits wide it is.
## Triggers
### Triggers
Triggers are things that will cause the logic analyzer core to capture data from the probes. These get specified as a Verilog expression, and are partially reconfigurable on-the-fly. This will get elaborated on more as it's implemented, but if your trigger condition can be represented as a sum-of-products with each product being representable as an operator from the list [`==`, `!=`,`>`, `<`,`>=`, `<=`, `||`,`&&`, `^`] along with a configurable register and a probe, you won't need to rebuild the bitstream to update the trigger condition. Whew, that was a mouthful.
## Operating Modes
### Trigger Position
The logic analyzer has a programmable _trigger position_, which sets when probe data is captured relative to the trigger condition being met. This is best explained with a picture:
For instance, setting the trigger position to `100` will cause the logic analyzer to save 100 samples of the probes prior to the trigger condition occuring. Manta uses a default holdoff value of `SAMPLE_DEPTH/2`, which positions the data capture window such that the trigger condition is in the middle of it.
### Operating Modes
The logic analyzer can operate in a number of modes, which govern what trigger conditions start the capture of data:
* __Single-Shot__: When the trigger condition is met, grab the whole thing.
* __Incremental__: Only pull values when the trigger condition is met. Ignore values received while the trigger condition is not met,
* __Single-Shot__: Once the trigger condition is met, record every subsequent sample until `SAMPLE_DEPTH` samples have been acquired. This is the mode most benchtop logic analyzers run in, so the Logic Analyzer Core defaults to this mode unless configured otherwise.
* __Incremental__: Record samples when the trigger condition is met, and don't record the samples when the trigger condition is not met. This is super useful for applications like audio processing or memory controllers, where there are many system clock cycles between signals of interest.
* __Immediate__: Read the probe states into memory immediately, regardless of if the trigger condition is met.
## Holdoff
## Usage
The logic analyzer has a programmable _holdoff_, which sets when probe data is captured relative to the trigger condition being met. For instance, setting the holdoff to `100` will cause the logic analyzer to start recording probe data 100 clock cycles after the trigger condition occuring.
### Capturing Data
Holdoff values can be negative! When this is configured, new probe values are being continuously pushed to the buffer, while old ones are pushed off. This measns that the probe data for the last `N` timesteps can be saved, so long as `N` is not larger than the depth of the memory.
Once you have your Logic Analyzer core on the FPGA, you can capture data with:
Manta uses a default holdoff value of `-SAMPLE_DEPTH/2`, which positions the data capture window such that the trigger condition lives in the middle of it. Here's a diagram:
```
manta capture [config file] [LA core] [path] [path]
```
Similarly, a holdoff of `-SAMPLE_DEPTH` would place the trigger condition at the right edge of the trigger window. A holdoff of `0` would place the trigger at the left edge of the window. Postive holdoff would look like this:
If the file `manta.yaml` contained the configuration above, and you wanted to export a .vcd and .mem of the captured data, you would execute:
## Everything Else
```
manta capture manta.yaml my_logic_analyzer capture.vcd capture.mem
```
Manta needs to know what clock frequency you plan on running it at so that it can progperly generate the baudrate you desire. It also needs to know what serial port your FPGA is on, as well as how to configure the interface. Right now only standard 8N1 serial is supported by the FPGA.
This will reset your logic analyzer, configure it with the triggers specified in `manta.yaml`, and perform a capture. The resulting .vcd file can be opened in a waveform viewer like [GTKWave](https://gtkwave.sourceforge.net/), and the `.mem` file can be used for playback as described in the following section.
Manta will stuff the capture data into as many files as you provide it on the command line, so if you don't want the `.mem` or `.vcd` file, just omit their paths.
### Playback
The LogicAnalyzerCore has the ability to capture a recording of a set of signals on the FPGA, and then 'play them back' inside a Verilog simulation. This requires generating a small Verilog module that loads a capture from a `.mem` file, which can be done by:
```
manta playback [config file] [LA core] [path]
```
If the file `manta.yaml` contained the configuration above, then running:
```
manta playback manta.yaml my_logic_analyzer sim/playback.v
```
Generates a Verilog wrapper at `sim/playback.v`, which can then be instantiated in the testbench in which it is needed. An example instantiation is provided at the top of the output verilog, so a simple copy-paste into the testbench is all that's necessary to use the module.
## Examples

29
doc/lut_memory_core.md Normal file
View File

@ -0,0 +1,29 @@
A LUT Memory Core is simply just a set of registers that live on the bus, and thus implemented in Look Up Tables (LUTs). Their only connection is to the bus, so they aren't reachable from user code. For bus-tied memory that's interfaceable with user code, consider the [Block Memory Core](../block_memory_core).
LUT Memory Cores are convenient for when the host machine needs to store a small amount of data on the FPGA, accessible only to itself.
I have no idea under what circumstances this would be useful, but perhaps someone with fresher eyes then mine would be able to see something. @Joe, thoughts?
## Configuration
Just like every core, a given LUT Memory core is described in Manta's configuration file:
```yaml
cores:
my_lut_ram:
type: lut_ram
size: 64
```
Each register is 16-bits wide, so the only configuration option is just the size of the memory.
## Python
The core can be written to and read from in Python with the following:
```python
m.my_lut_ram.write(addr, data)
foo = m.my_lut_ram.read(addr)
```
## Examples
A LUT Memory core is used in the lut_ram examples, for both the [nexys_a7](https://github.com/fischermoseley/manta/tree/main/examples/nexys_a7/lut_ram) and the [icestick](https://github.com/fischermoseley/manta/tree/main/examples/icestick/lut_ram).

17
doc/tutorial_1.md Normal file
View File

@ -0,0 +1,17 @@
## Welcome Back!
We're going to jump right on in on this one. Today's testing is going to focus on one of the cornerstones of our medium-scale FPGA projects - the BRAM! Manta's been designed primarily as a debugging tool - but more generally its purpose is to shuffle data about. And a BRAM is one of the more useful places on a FPGA that it can go.
In today's exercise, we'll be revisitng our lab03 (popcat pong) code, which used a BRAM to store the contents of an image, which we rendered as a sprite. Here we'll be doing almost exactly the same thing, except we'll be hooking our BRAM up to Manta, which will let us put whatever image we'd like into the BRAM. We'll just be sending data _into_ the BRAM, but we could just as easily pull data out of it - say if we had a VGA camera connected to our board that dumped images into a framebuffer, which we wanted to dump to a host machine.
This should hopefully be nice and quick. Go ahead and grab the starter code from here:
And just like last time, we'll need to create a config file that defines our BRAM - what it's called, how many bits wide the input is, and how many entries it has (depth). Here's an example configureation:
```yaml
mam: bro
```
Go ahead and make a configuration of your own like this, and name it something super creative and interesting. I named mine `manta.yaml`.

View File

@ -11,7 +11,8 @@ cores:
shemp: 4
triggers:
- larry && curly && ~moe
- moe RISING
- curly FALLING
uart:
port: "auto"

View File

@ -36,11 +36,18 @@ theme:
extra_css:
- stylesheets/extra.css
extra_javascript:
- javascripts/mathjax.js
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
markdown_extensions:
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.arithmatex:
generic: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.superfences
@ -58,7 +65,7 @@ nav:
- Block Memory Core: block_memory_core.md
- LUT Memory Core: lut_memory_core.md
- Tutorials:
- Tutorial 0 (Installation + IO Core): tutorial_0.md
- Tutorial 0 - IO Core: tutorial_0.md
- Tutorial 1 - Logic Analyzer Core: tutorial_1.md
- Tutorial 2 - Block Memory Core: tutorial_2.md
- Developer Reference:

View File

@ -232,7 +232,7 @@ class UARTInterface:
data = []
for i in range(0, len(inbound_bytes), 7):
response = inbound_bytes[i:i+7]
data = self.decode_response(response)
data.append(self.decode_response(response))
return data
@ -462,22 +462,30 @@ class LogicAnalyzerCore:
assert (
"sample_depth" in config
), "Sample depth not found for logic analyzer core."
assert isinstance(config["sample_depth"], int), "Sample depth must be an integer."
self.sample_depth = config["sample_depth"]
# Add probes
assert "probes" in config, "No probe definitions found."
assert len(config["probes"]) > 0, "Must specify at least one probe."
for probe_name, probe_width in config["probes"].items():
assert (
probe_width > 0
), f"Probe {probe_name} is of invalid width - it must be of at least width one."
assert probe_width > 0, f"Probe {probe_name} is of invalid width - it must be of at least width one."
self.probes = config["probes"]
# Add triggers
assert "triggers" in config, "No triggers found."
assert len(config["triggers"]) > 0, "Must specify at least one trigger."
self.triggers = config["triggers"]
# Add trigger location
self.trigger_loc = self.sample_depth // 2
if "trigger_loc" in config:
assert isinstance(config["trigger_loc"], int), "Trigger location must be an integer."
assert config["trigger_loc"] >= 0, "Trigger location cannot be negative."
assert config["trigger_loc"] <= self.sample_depth, "Trigger location cannot exceed sample depth."
self.trigger_loc = config["trigger_loc"]
# compute base addresses
self.fsm_base_addr = self.base_addr
@ -639,6 +647,52 @@ class LogicAnalyzerCore:
return ports
#return VerilogManipulator().net_dec(self.probes, "input wire")
def configure_trigger_conditions(self):
operations = {
"DISABLE" : 0,
"RISING" : 1,
"FALLING" : 2,
"CHANGING" : 3,
"GT" : 4,
"LT" : 5,
"GEQ" : 6,
"LEQ" : 7,
"EQ" : 8,
"NEQ" : 9
}
ops_with_no_args = ["DISABLE", "RISING" , "FALLING", "CHANGING"]
# reset all the other triggers
for addr in range(self.trigger_block_base_addr, self.block_memory_base_addr):
self.interface.write_register(addr, 0)
for trigger in self.triggers:
# determine if the trigger is good
# most triggers will have 3 parts - the trigger, the operation, and the argument
# this is true unless the argument is RISING, FALLING, or CHANGING
statement = trigger.split(' ')
if len(statement) == 2:
assert statement[1] in ops_with_no_args, "Invalid operation in trigger statement."
probe_name, op = statement
op_register = 2*(list(self.probes.keys()).index(probe_name)) + self.trigger_block_base_addr
self.interface.write_register(op_register, operations[op])
else:
assert len(statement) == 3, "Missing information in trigger statement."
probe_name, op, arg = statement
op_register = 2*(list(self.probes.keys()).index(probe_name)) + self.trigger_block_base_addr
arg_register = op_register + 1
self.interface.write_register(op_register, operations[op])
self.interface.write_register(arg_register, int(arg))
# functions for actually using the core:
@ -654,15 +708,13 @@ class LogicAnalyzerCore:
state = self.interface.read_register(self.state_reg_addr)
assert state == self.IDLE, "Logic analyzer did not reset to correct state when requested to."
# Configure trigger settings and positions - highkey don't really know how we're going to do this
# for now, let's just trigger on a changing value of the first probe
print(" -> Configuring triggers...")
self.interface.write_register(self.trigger_block_base_addr, 3)
trigger_setting = self.interface.read_register(self.trigger_block_base_addr)
assert trigger_setting == 3, "Trigger did not save the value written to it."
# Configure trigger conditions
print(" -> Configuring trigger conditions...")
self.configure_trigger_conditions()
# Configure the trigger_pos, but we'll skip that for now
# Configure the trigger_loc, but we'll skip that for now
print(" -> Setting trigger location...")
self.interface.write_register(self.trigger_loc_reg_addr, self.trigger_loc)
# Start the capture by pulsing request_start
print(" -> Starting capture...")