OpenRAM/docs/modules.tex

602 lines
26 KiB
TeX

\section{Modules}
\label{sec:modules}
This section provides an overview of the main modules that are used in
an SRAM. For each module, we will provide both an architectural
description and an explanation of how that design is generated and
used in OpenRAM. The modules described below are provided in the
first release of OpenRAM, but by no means is this an exhaustive list
of the possible circuits that can be adapted into a SRAM architecture;
refer to Section~\ref{sec:implementation} for more information on
adding different module designs to the compiler.
Data structures for schematic and layout are provided in the
\verb|base| directory. These implement a generic design object and
have many auxiliary functions for routing, pin access, placement,
DRC/LVS, etc. These are discussed further in
Section~\ref{sec:implementation}.
Each module has a corresponding Python class in the
\verb|compiler/modules| directory. These classes are used to generate
both the GDSII layout and spice netlists. A module can consist of
hard library cells (Section~\ref{sec:techdir}), paramterized
cells (Section~\ref{sec:parameterized}) or other modules.
When combining modules at any level of hierarchy, DRC rules for
minimum spacing of metals, wells, etc. must be followed and DRC and
LVS are run by default after each hierarchical module's creation. A
module is responsible for creating its own pins to enable routing
at the next level up in the hierarchy. A module must also define its
height and width assuming a (0,0) offset for the lower-left coordinate
to aid with placement.
\subsection{The Bitcell and Bitcell Array}
\label{sec:bitcellarray}
OpenRAM can work with any cell as the bitcell. This could be a foundry
created one or a user design rule cell for experiments. In addition,
it could be a common 6T cell or it could be replaced with an 8T, 10T
or other cell, depending on needs.
By default, OpenRAM uses a standard 6T cell as shown in
Figure~\ref{fig:6t_cell}. The cross coupled inverters hold a single
data bit that can either be driven into, or read from the cell by the
bitlines. The access transistors are used to isolate the cell from
the bitlines so that data is not corrupted while a cell is not being
accessed.
\begin{figure}[h!]
\centering
\includegraphics[scale=.9]{figs/cell_6t_schem.pdf}
\caption{Standard 6T cell.}
\label{fig:6t_cell}
\end{figure}
% tiling memory cells
The 6T cells are tiled together in both the horizontal and vertical
directions to make up the memory array.
% keeping it square
It is common practice to keep the aspect ratio of a memory array
roughly ``square'' to ensure that the bitlines and wordlines do not
become too long. If the bitlines are too long, this can increase the
bitline capacitance, slow down the operation and lead to bitline
leakage problems. To make an array ``more square'', multiple words
can share rows by interleaving the bits of each word. The column mux
in Section~\ref{sec:column_mux} is responsbile for selecting a subset
of bitcells in a row to extract a word during read and write
operations.
% memory cell is a library cell
In OpenRAM, we provide a library cell for the 6T cell that can be
swapped with a fab memory cell, if available. The transitors in the
cell are sized appropriately considering read and write noise margins.
% bitcell and bitcell_array classes
The bitcell class in \verb|modules/bitcell.py| is a single
memory cell and is usually a pre-made library cell.
% bitcell_array
The bitcell\_array class in \verb|modules/bitcell_array.py| dynamically
implements the memory cell array by instantiating a the bitcell class
in rows and columns.
% abutment connections
During the tiling process, bitcells are abutted so that all bitlines
and word lines are connected in the vertical and horizontal directions
respectively. This is done by using the boundary layer to define the
height and width of the cell. If this is not specified, OpenRAM will
use the bounding box of all shapes as the boundary. The boundary layer
should be offset at (0,0) in the lower left coordinate.
% flipping
In order to share supply rails, bitcells are flipped in alternating
rows.
\subsection{Precharge Circuitry}
\label{sec:precharge}
The precharge circuit is depicted in Figure~\ref{fig:precharge} and is
implemented by three PMOS transistors. The input signal to the cell,
clk, enables all three transistors during the first half of a read or
write cycle (i.e. while the clock signal is low). M1 and M2 charge bl
and br to vdd while M3 equalizes the voltages seen between the bitlines.
\begin{figure}[h!]
\centering
\includegraphics[width=5cm]{./figs/precharge_schem.pdf}
\caption{Schematic of a precharge circuit.}
\label{fig:precharge}
\end{figure}
In OpenRAM, the precharge citcuitry is dynamically generated using the
parameterized transistor class ptx which is further discussed in
Section~\ref{sec:ptx}. The offsets of the bitlines and the width of
the precharge cell are equal to the bitcell so that the bitlines are
correctly connected by abutment. The precharge class in
\verb|modules/precharge.py| dynamically generates a single precharge
cell.
\verb|modules/precharge_array.py| creates a row of precharge cells at
the top of a bitcell array.
\subsection{Address Decoders}
\label{sec:address_decoder}
The address decoder deodes the binary-encoded row address bits from the
address bus as inputs, and asserts a one-hot wordline in the row that
data is to be read or written. OpenRAM provides a hierarchical address
decoder as the default, but will soon have other options.
The address decoders are created using parameterized gates (pnand2,
pnand3, pinv) and transistors (ptx). This means that the decoders do
not rely on any hard library cells.
\subsubsection{Hierarchical Decoder}
\label{sec:hierdecoder}
A simple 2:4 decoder is shown in Figure~\ref{fig:2:4decoder}. This
decoder computes all of the possible decode values using a single
level of nand gates along with the inverted and non-inverted inputs.
As the decoder size increases the size of the nand gates required for
decoding would increase proportional to the bits to be decoded. This
would not be practical for large decoders.
\begin{figure}[h!]
\centering
\includegraphics[scale=.6]{./figs/2t4decoder.pdf}
\caption{Schematic of 2-4 simple decoder.}
\label{fig:2:4decoder}
\end{figure}
A hierarchical decoder uses two-levels of decoding hierarchy to
perform an address decode. The first stage computes predecoded values
while the second stage computes the final decoded values.
Figure~\ref{fig:4 to 16 decoder} shows a 4:16 heirarchical
decoder. The decoder uses two 2:4 decoders for
predecoding and 2-input nand gates and inverters for final decoding to
form the 4:16 decoder.
\begin{figure}[h!]
\centering
\includegraphics[scale=.6]{./figs/4t16decoder.pdf}
\caption{Schematic of 4:16 hierarchical decoder.}
\label{fig:4 to 16 decoder}
\end{figure}
The predecoder generates a total of 8 intermediate signals from the
address bits and their complements. These intermediate signals are in
two groups of 4 from each decoder. The enumeration of all 4 x 4
predecoded values are used by the final decode to produce the 16
decoded results. As an example, Table~\ref{table:4-16 hierarchical_decoder}
gives the detailed input and output siganls for the 4:16 hierarchical
decoder.
\begin{table}[h!]
\begin{center}
\begin{tabular}{| c | c | c | c |}
\hline
A[3:0] & predecoder1 & predecoder2 & Selected WL\\ \hline
0000 & 1000 & 1000 & 0\\ \hline
0001 & 1000 & 0100 & 1\\ \hline
0010 & 1000 & 0010 & 2\\ \hline
0011 & 1000 & 0001 & 3\\ \hline
0100 & 0100 & 1000 & 4\\ \hline
0101 & 0100 & 0100 & 5\\ \hline
0110 & 0100 & 0010 & 6\\ \hline
0111 & 0100 & 0001 & 7\\ \hline
1000 & 0010 & 1000 & 8\\ \hline
1001 & 0010 & 0100 & 9\\ \hline
1010 & 0010 & 0010 & 10\\ \hline
1011 & 0010 & 0001 & 11\\ \hline
1100 & 0001 & 1000 & 12\\ \hline
1101 & 0001 & 0100 & 13\\ \hline
1110 & 0001 & 0010 & 14\\ \hline
1111 & 0001 & 0001 & 15\\ \hline
\end{tabular}
\end{center}
\caption{Truth table for 4:16 hierarchical decoder.}
\label{table:4-16 hierarchical_decoder}
\end{table}
As the address size increases, additional sizes of pre- and final
decoders can be used. In OpenRAM, there are implementations for
\verb|modules/hierarchical\_predecode2x4.py| and
\verb|modules/hierarchical\_predecode3x8.py| to produce 2:4 and 3:8
predecodes, respectively. These same decoders are used to generate the
column mux select bits as well.
For the final decode, we can use either pnand2 or pnand3 gates. This
allows a maximum size of three 3:8 predocers along with a final pnand3 decode
stage, or, 512 word lines. To extend beyond this, a pnand4 or
a 4:16 predecoder would be needed.
\subsection{Wordline Driver}
\label{sec:wldriver}
The word line driver buffers the address decoder to drive the wordline and
gates the signal until the decode has stabilized. Without waiting, an
incorrectly asserted wordline could erase memory contents.
The word line driver is sized according to the bitcell array width so
that wordlines in larger memory arrays can be appropriately driven.
% gating for first half decode, second half read/write
The first half of the clock cycle is used for address decoding in
OpenRAM. Therefore, the wordline driver is enabled in the second half
of the clock cycle in OpenRAM. The buffered clock signal drives each
wordline driver row and is logically ANDed with the decoder output.
% bank clock gating for wordline driver
In multi-bank structures the clock buffer is also anded with the bank
select signal to prevent the read/writing of an entire bank.
\begin{figure}[h!]
\centering
\includegraphics[scale=.6]{./figs/wordline_driver.pdf}
\caption{Diagram of word line driver.}
\label{fig:wordline_driver}
\end{figure}
Figure~\ref{fig:wordline_driver} illustrates the wordline driver and
its inputs/outputs. This is implemented in the
\verb|modules/wordline_driver.py| module and matches the number of
rows in the bitcell array of a bank.
OpenRAM creates the wordline drivers using the parameterized pinv and
pnand2 classes. This enables the wordline driver to be matched to the
bitcell height and to sized to drive the wordline load.
\subsection{Column Mux}
\label{sec:column_mux}
The column mux is an optional module in an SRAM bank. Without a column
mux, the bank is assumed to have a single word in each row. A column
mux enables more more than one word to be stored in each row and
read/written individually. The column mux is used for both the read
and write operations by connecting the bitlines of a bank to
both the sense amplifier and the write driver.
In OpenRAM, the column mux uses the {\bf high address bits} to select
the appropriate word in each row. If n-bits are used, there are $2^n$
words in each row. OpenRAM currently allows 2, 4, or 8 words per row,
but the 8 words are not fully debugged (as of 2/12/18).
%% OpenRAM provides several options for column mux, but the default
%% is a single-level column mux which is sized for optimal speed.
%% \subsubsection{Tree\_Decoding Column Mux}
%% \label{sec:tree_decoding_column_mux}
%% The schematic for a 4-1 tree
%% multiplexer is shown in Figure~\ref{fig:colmux}.
%% \begin{figure}[h!]
%% \centering
%% \includegraphics[scale=.9]{./figs/tree_column_mux_schem.pdf}
%% \caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
%% \label{fig:colmux}
%% \end{figure}
%% \fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
%% This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
%% and outputs. This 4-1 tree mux illustrates the process of choosing
%% the correct bitlines if there are 4 words per row in the memory array.
%% Each bitline pair represents a single bit from each word. A binary
%% reduction pattern, shown in Table~\ref{table:colmux}, is used to
%% select the appropriate bitlines. As the number of words per row in
%% the memory array increases, the depth of the column mux grows. The
%% depth of the column mux is equal to the number of bits in the column
%% address bus. The 4-1 tree mux has a depth of 2. In level 1, the
%% least significant bit from the column address bus selects either the
%% first and second words or the third and fourth words. In level 2, the
%% most signifant column address bit selects one of the words passed down
%% from the previous level. Relative to other column mux designs, the
%% tree mus uses significantly less devices. But, this type of design
%% can provide poor performance if a large decoder with many levels are
%% needed. The delay of of a tree mux quadratically increases with each
%% level. Due to this fact, other types of column
%% decoders should be considered for larger arrays.
%% \begin{table}[h!]
%% \begin{center}
%% \begin{tabular}{| c | c | c | c |}
%% \hline
%% Selected BL & Inp1 & Inp2 & Binary\\ \hline
%% BL0 & SEL0\_bar & SEL1\_bar & 00\\ \hline
%% BL1 & SEL0 & SEL1\_bar & 01\\ \hline
%% BL2 & SEL0\_bar & SEL1 & 10\\ \hline
%% BL3 & SEL0 & SEL1 & 11\\
%% \hline
%% \end{tabular}
%% \end{center}
%% \caption{Binary reduction pattern for 4-1 tree column mux.}
%% \label{table:colmux}
%% \end{table}
%% In OpenRAM, the tree column mux is a dynamically generated design. The
%% \verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
%% and \verb|mux_abar|. The only diffference between these cells is that input
%% select signal is either hooked up to the \textbf{SEL} or
%% \textbf{SEL\_bar} signals (see highlighted boxes in
%% Figure~\ref{fig:colmux}). These cells are initialized the the
%% \verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
%% of \verb|ptx| PMOS transistors are added to the design and the necessary
%% routing is performed using the \verb|add_rect()| function. A horizontal rail
%% is added in metal2 for both the SEL and Sel\_bar signals. Underneath
%% those input rails, horizontal straps are added. These straps are used
%% to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
%% outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
%% bottom of the cell so that connections can be made down to the sense
%% amp. Vertical connectors are also added in metal1 so that the cells
%% can connect down to other mux cells when the depth of the tree mux is
%% more than one level.
%% The \verb|tree_mux_array| class is used to generate the tree mux.
%% Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
%% are tiled row by row. The offset of the cell in a row is determined
%% by the depth of that row in the tree mux. The pattern used to
%% determine the offset of the mux cells is
%% $muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
%% depth increases, the mux cells become further apart. A separate
%% ``for'' loop is invoked if the $depth>1$, which extends the
%% power/ground and select rails across the entire width of the array.
%% Similarly, if the $depth>1$, spice net names are created for the
%% intermediate connection made at the various levels. This is necessary
%% to ensure that a correct spice netlist is generated and that the
%% input/output pins of the column mux match the pins in the modules that
%% it is connected to.
\subsubsection{Single-Level Column Mux}
\label{sec:single_level_column_mux}
OpenRAM includes a single-level pass-gate mux implemtation for the
column mux. A single level of NMOS devices is driven by either the
input address (and it's complement) or decoded input addresses using a
2:4 predecoder (Section~\ref{sec:hierdecoder}).
Figure~\ref{fig:2t1_single_level_column_mux} shows the schematic of a
2:1 single-level column mux. In this column mux, the {\bf MSB of the
address bus} and it's complement drive the pass transistors.
Figure~\ref{fig:4t1_single_level_column_mux} shows the schematic of a
4:1 single-level column mux. The select bits are decoded from the {\bf
2 MSB of the address bus} using a 2:4 decoder. The 2:4 decoder
provides one-hot select signals to select one column.
In OpenRAM, one mux, single\_level\_mux, is dynamically generated in
\verb|modules/single_level_column_mux.py| and multiple of these muxes
are tiled together in \verb|modules/single_level_column_mux_array.py|.
single\_level\_mux uses the parameterized ptx (Section~\ref{sec:ptx}
to generate 2 or 4 NMOS transistors for each the bl and br
bitlines. Horizontal rails are added for the $sel$ signals. The
bitlines are automatically pitch-matched to the bitcell array.
\begin{figure}[h!]
\centering
\includegraphics[scale=.5]{./figs/2t1_single_level_column_mux.pdf}
\caption{Schematic of a 2:1 single level column mux. \fixme{Signals names are wrong.}}
\label{fig:2t1_single_level_column_mux}
\end{figure}
\begin{figure}[h!]
\centering
\includegraphics[scale=.5]{./figs/4t1_single_level_column_mux.pdf}
\caption{Schematic of a 4:1 single level column mux. \fixme{Signals names are wrong.}}
\label{fig:4t1_single_level_column_mux}
\end{figure}
\subsection{Sense Amplifier}
\label{sec:senseamp}
The sense amplifier is used to sense the difference between the
bitline and bitline bar while a read operation is performed.
The sense amplifier also includes two PMOS transistors for bitline
isolation to speed-up read operations. The schematic for the sense amp is shown in
Figure~\ref{fig:sense_amp}.
\begin{figure}[h!]
\centering
\includegraphics[scale=.8]{./figs/sense_amp_schem.pdf}
\caption{Schematic of a single sense amplifier cell.}
\label{fig:sense_amp}
\end{figure}
During address decoding (while the wordline is not asserted), the sense
amplifier is disabled and the bitlines are precharged to vdd by the
precharge unit. The two PMOS transistors also connect the bitlines to the sense amplifier.
The en signal comes from the control logic (Section~\ref{sec:control})
including the timing and replica bitline (Section~\ref{sec:RBL}). It
is only enabled after sufficient swing is seen on the bitlines so that
the value can be accurately sensed.
The sense amplifier is enabled by the en signal, which initiates the
read operation, and also isolates the sense amplifier from the
bitlines. This allows the sense amplifier to drive a smaller
capacitance rather than the whole bitline. At this time, the footer
transistor is also enabled which allows the sense amplifier to use
feedback to sense the bitline differential voltage.
When the sense amp is enabled, one of the bitlines experiences a
voltage drop based on the value stored in the memory cell. If a zero
is stored, the bitline voltage drops. If a one is stored, the bitline
bar voltage drops. The output signal is then
taken to a true logic level and latched for output to the data bus.
In OpenRAM, the sense amplifier is a libray cell. The associated
layout and spice netlist can be found in the \verb|gds_lib| and
\verb|sp_lib| in the technology directory. The sense\_amp class in
\verb|modules/sense_amp.py| is a single instance of the sense amp
library cell.
The sense\_amp\_array class in \verb|modules/sense_amp_array.py|
handles the tiling of the sense amps cells. One sense amp cell is
needed per data bit and the sense amp cells need to be appropriately
spaced so that they can hook up to the column mux bitline pairs. The
spacing is determined based on the number of words per row in the
memory array.
The sense amp is a library cell so that custom
amplifier designs could be swapped into the memory as needed. The two
major things that need to be considered while designing the sense
amplifier cell are the size of the cell and the bitline/input pitches.
Optimally, the cell should be no wider than the 6T cell so that it
abuts to the column mux and no extra routing or space is needed.
Also, the bitline inputs of the sense amp need to line up with the
outputs of the write driver. In the current version of OpenRAM, the
write driver is situated under the sense amp, which had bitlines
spaning the entire height of the cell. In this case, the sense
amplifier is disabled during a write operation but the bitlines still
connect the write driver to the column mux without any extra routing.
\subsection{Write Driver}
\label{sec:writedriver}
The write driver is used to drive the input signal into the memory
cell during a write operation. It can be seen in
Figure~\ref{fig:write_driver} that the write driver consists of two
tristate buffers, one inverting and one non-inverting. It takes in a
data bit, from the data bus, and outputs that value on the bitline,
and its complement on bitline bar. The bitlines need to be
complements so that the data value can be correctly stored in the 6T
cell. Both tristates are enabled by the EN signal.
\begin{figure}[h!]
\centering
\includegraphics[scale=.8]{./figs/write_driver_schem.pdf}
\caption{Schematic of a write driver cell, which consists of 2 tristates (non-inverting and inverting) to drive the bitlines.}
\label{fig:write_driver}
\end{figure}
Currently, in OpenRAM, the write driver is a library cell. The
associated layout and spice netlist can be found in the \verb|gds_lib| and
\verb|sp_lib| in the FreePDK45 directory. Similar to the \verb|sense_amp_array|,
the \verb|write_driver_array| class tiles the write driver cells. One
driver cell is needed per data bit and Vdd, Gnd, and EN signals must
be extended to span the entire width of the cell. It is not optimal to
have the write driver as a library cell because the driver needs to be
sized based on the capacitance of the bitlines. A large memory array
needs a stronger driver to drive the data values into the memory
cells. We are working on creating a parameterized tristate class,
which will dynamically generate write driver cells of different
sizes/strengths.
\subsection{Flip-Flop Array}
In a synchronous SRAM it is necessary to synchronize the inputs and
outputs with a clock signal by using flip-flops. In FreePDK45 we
provide a library cell for a simple master-slave flip-flop, see
schematic in Figure~\ref{fig:ms_flop}. In our library cell we provide
both Q and Q\_bar as outputs of the flop because inverted signals are
used in various modules. The \verb|ms_flop| class in \verb|ms_flop.py|
instatitates a single master-slave flop, and the \verb|ms_flop_array| class
generates an array of flip-flops. Arrays of flops are necessary for
the data bus (an array for both the inputs and outputs) as well as the
address bus (an array for row and column inputs). The \verb|ms_flop_array|
takes the number of flops and the type of array as inputs. Currently,
the type of the array must be either ``data\_in'', ``data\_out'',
``addr\_row'', or ``addr\_col'' verbatim. The array type input is
used to look up that associated pin names for each of the flop arrays.
This was implemented very quickly and should be improved in the near
future...
\begin{figure}[h!]
\centering
\includegraphics[scale=.7]{./figs/ms_flop_schem.pdf}
\caption{Schematic of a master-slave flip-flop provided in FreePDK45 library}
\label{fig:ms_flop}
\end{figure}
\subsection{Control Logic}
The details of the control logic architecture are outlined in
Section~\ref{sec:control}. The control logic module,
\verb|control_logic.py|, instantiates a \verb|control_logic| class that arranges
all of the flip-flops and logic associated with the control signals
into a single module. Flip-flops are instantiated for each control
signal input and library NAND and NOR gates are used for the logic. A
delay chain, of variable length, is also generted using parameterized
inverters. The associated layouts and spice netlists can be found in
the \verb|gds_lib| and \verb|sp_lib| in the FreePDK45 directory.
\section{Bank and SRAM}
\label{sec:bank}
The overall memory architecture is shown in figure~\ref{fig:bank}.
As shown in this figure one Bank contains different modules including
precharge-array which is positioned above the bitcell-array,
column-mux-array which is located below the bitcell-array,
sense-amp-array, write-driver-array, data-in-ms-flop-array
to synchronize the input data with negative edge of the clock,
tri-gata-array to share the bidirectional data-bus between input
and output data, hierarchical decoder which is placed on the right side
of the bitcell-array (predecoder + decoder), wordline-driver which drives
the wordlines horizontally across the bitcell-array and address-ms-flops
to synchronize the input address with positive edge of the clock.
In bitcell-array each memory cell is mirrored vertically and horizontally inorder to share VDD and GND rails with adjacent cells and form the array.
Data-bus is connected to tri-gate, address-bus is connected to address-ms-flops and bank-select
signal will enable the bank when it goes high. To complete the SRAM design, bank is connected to control-logic as shown in figure~\ref{fig:bank}.
Control-logic controls the timing
of modules inside the bank. CSb, OEb, Web and clk are inputs to the control logic and output of
control logic will ANDed with bank-select signal and send to the corresponding modules.
\begin{figure}[h!]
\centering
\includegraphics[scale=1]{./figs/bank.pdf}
\caption{Overal bank and SRAM architecture.}
\label{fig:bank}
\end{figure}
In order to reduce the delay and power, divided wordline strategy have been used in this compiler. Part of the address bits
are used to define the global wordline (bank-select) and rest of address bits are connected to hierarchical
decoder inside each bank to generate local wordlines that actually drive the bitcell access transistors.
As shown in figure~\ref{fig:bank2} SRAM is divided to two banks which share data-bus, address-bus, control-bus and control-logic.
In this case one bit of address (most significant bit) goes to an ms-flop and outputs of ms-flop (address-out and address-out-bar)
are connected to banks as bank-select signals. Control logic is shared between two banks and based on which bank is selected,
control signals will activate modules inside the selected bank. In this architecture, the total cell capacitance is reduced by up
to a factor of two. Therefore the power will be reduced greatly and the delay among the wordlines is also reduced.
\begin{figure}[h!]
\centering
\includegraphics[scale=.9]{./figs/bank2.pdf}
\caption{SRAM is divided to two banks which share the control-logic.}
\label{fig:bank2}
\end{figure}
In figure~\ref{fig:bank4}, four banks are connected together. In this case a 2:4 decoder is added to select one of the banks using two
most significant bits of input address. Control signals are connected to all banks but will turn on only the selected bank.
\begin{figure}[h!]
\centering
\includegraphics[scale=.9]{./figs/bank4.pdf}
\caption{SRAM is divided to 4 banks wich are controlled by the control-logic and a 2:4 decoder.}
\label{fig:bank4}
\end{figure}