mirror of https://github.com/VLSIDA/OpenRAM.git
602 lines
26 KiB
TeX
602 lines
26 KiB
TeX
\section{Modules}
|
|
\label{sec:modules}
|
|
|
|
This section provides an overview of the main modules that are used in
|
|
an SRAM. For each module, we will provide both an architectural
|
|
description and an explanation of how that design is generated and
|
|
used in OpenRAM. The modules described below are provided in the
|
|
first release of OpenRAM, but by no means is this an exhaustive list
|
|
of the possible circuits that can be adapted into a SRAM architecture;
|
|
refer to Section~\ref{sec:implementation} for more information on
|
|
adding different module designs to the compiler.
|
|
|
|
Data structures for schematic and layout are provided in the
|
|
\verb|base| directory. These implement a generic design object and
|
|
have many auxiliary functions for routing, pin access, placement,
|
|
DRC/LVS, etc. These are discussed further in
|
|
Section~\ref{sec:implementation}.
|
|
|
|
Each module has a corresponding Python class in the
|
|
\verb|compiler/modules| directory. These classes are used to generate
|
|
both the GDSII layout and spice netlists. A module can consist of
|
|
hard library cells (Section~\ref{sec:techdir}), paramterized
|
|
cells (Section~\ref{sec:parameterized}) or other modules.
|
|
|
|
When combining modules at any level of hierarchy, DRC rules for
|
|
minimum spacing of metals, wells, etc. must be followed and DRC and
|
|
LVS are run by default after each hierarchical module's creation. A
|
|
module is responsible for creating its own pins to enable routing
|
|
at the next level up in the hierarchy. A module must also define its
|
|
height and width assuming a (0,0) offset for the lower-left coordinate
|
|
to aid with placement.
|
|
|
|
|
|
\subsection{The Bitcell and Bitcell Array}
|
|
\label{sec:bitcellarray}
|
|
|
|
OpenRAM can work with any cell as the bitcell. This could be a foundry
|
|
created one or a user design rule cell for experiments. In addition,
|
|
it could be a common 6T cell or it could be replaced with an 8T, 10T
|
|
or other cell, depending on needs.
|
|
|
|
By default, OpenRAM uses a standard 6T cell as shown in
|
|
Figure~\ref{fig:6t_cell}. The cross coupled inverters hold a single
|
|
data bit that can either be driven into, or read from the cell by the
|
|
bitlines. The access transistors are used to isolate the cell from
|
|
the bitlines so that data is not corrupted while a cell is not being
|
|
accessed.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.9]{figs/cell_6t_schem.pdf}
|
|
\caption{Standard 6T cell.}
|
|
\label{fig:6t_cell}
|
|
\end{figure}
|
|
|
|
% tiling memory cells
|
|
The 6T cells are tiled together in both the horizontal and vertical
|
|
directions to make up the memory array.
|
|
|
|
% keeping it square
|
|
It is common practice to keep the aspect ratio of a memory array
|
|
roughly ``square'' to ensure that the bitlines and wordlines do not
|
|
become too long. If the bitlines are too long, this can increase the
|
|
bitline capacitance, slow down the operation and lead to bitline
|
|
leakage problems. To make an array ``more square'', multiple words
|
|
can share rows by interleaving the bits of each word. The column mux
|
|
in Section~\ref{sec:column_mux} is responsbile for selecting a subset
|
|
of bitcells in a row to extract a word during read and write
|
|
operations.
|
|
|
|
% memory cell is a library cell
|
|
In OpenRAM, we provide a library cell for the 6T cell that can be
|
|
swapped with a fab memory cell, if available. The transitors in the
|
|
cell are sized appropriately considering read and write noise margins.
|
|
|
|
% bitcell and bitcell_array classes
|
|
The bitcell class in \verb|modules/bitcell.py| is a single
|
|
memory cell and is usually a pre-made library cell.
|
|
|
|
% bitcell_array
|
|
The bitcell\_array class in \verb|modules/bitcell_array.py| dynamically
|
|
implements the memory cell array by instantiating a the bitcell class
|
|
in rows and columns.
|
|
|
|
% abutment connections
|
|
During the tiling process, bitcells are abutted so that all bitlines
|
|
and word lines are connected in the vertical and horizontal directions
|
|
respectively. This is done by using the boundary layer to define the
|
|
height and width of the cell. If this is not specified, OpenRAM will
|
|
use the bounding box of all shapes as the boundary. The boundary layer
|
|
should be offset at (0,0) in the lower left coordinate.
|
|
|
|
% flipping
|
|
In order to share supply rails, bitcells are flipped in alternating
|
|
rows.
|
|
|
|
|
|
|
|
\subsection{Precharge Circuitry}
|
|
\label{sec:precharge}
|
|
|
|
The precharge circuit is depicted in Figure~\ref{fig:precharge} and is
|
|
implemented by three PMOS transistors. The input signal to the cell,
|
|
clk, enables all three transistors during the first half of a read or
|
|
write cycle (i.e. while the clock signal is low). M1 and M2 charge bl
|
|
and br to vdd while M3 equalizes the voltages seen between the bitlines.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[width=5cm]{./figs/precharge_schem.pdf}
|
|
\caption{Schematic of a precharge circuit.}
|
|
\label{fig:precharge}
|
|
\end{figure}
|
|
|
|
In OpenRAM, the precharge citcuitry is dynamically generated using the
|
|
parameterized transistor class ptx which is further discussed in
|
|
Section~\ref{sec:ptx}. The offsets of the bitlines and the width of
|
|
the precharge cell are equal to the bitcell so that the bitlines are
|
|
correctly connected by abutment. The precharge class in
|
|
\verb|modules/precharge.py| dynamically generates a single precharge
|
|
cell.
|
|
|
|
\verb|modules/precharge_array.py| creates a row of precharge cells at
|
|
the top of a bitcell array.
|
|
|
|
|
|
|
|
|
|
\subsection{Address Decoders}
|
|
\label{sec:address_decoder}
|
|
|
|
The address decoder deodes the binary-encoded row address bits from the
|
|
address bus as inputs, and asserts a one-hot wordline in the row that
|
|
data is to be read or written. OpenRAM provides a hierarchical address
|
|
decoder as the default, but will soon have other options.
|
|
|
|
The address decoders are created using parameterized gates (pnand2,
|
|
pnand3, pinv) and transistors (ptx). This means that the decoders do
|
|
not rely on any hard library cells.
|
|
|
|
\subsubsection{Hierarchical Decoder}
|
|
\label{sec:hierdecoder}
|
|
|
|
|
|
A simple 2:4 decoder is shown in Figure~\ref{fig:2:4decoder}. This
|
|
decoder computes all of the possible decode values using a single
|
|
level of nand gates along with the inverted and non-inverted inputs.
|
|
As the decoder size increases the size of the nand gates required for
|
|
decoding would increase proportional to the bits to be decoded. This
|
|
would not be practical for large decoders.
|
|
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.6]{./figs/2t4decoder.pdf}
|
|
\caption{Schematic of 2-4 simple decoder.}
|
|
\label{fig:2:4decoder}
|
|
\end{figure}
|
|
|
|
A hierarchical decoder uses two-levels of decoding hierarchy to
|
|
perform an address decode. The first stage computes predecoded values
|
|
while the second stage computes the final decoded values.
|
|
Figure~\ref{fig:4 to 16 decoder} shows a 4:16 heirarchical
|
|
decoder. The decoder uses two 2:4 decoders for
|
|
predecoding and 2-input nand gates and inverters for final decoding to
|
|
form the 4:16 decoder.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.6]{./figs/4t16decoder.pdf}
|
|
\caption{Schematic of 4:16 hierarchical decoder.}
|
|
\label{fig:4 to 16 decoder}
|
|
\end{figure}
|
|
|
|
The predecoder generates a total of 8 intermediate signals from the
|
|
address bits and their complements. These intermediate signals are in
|
|
two groups of 4 from each decoder. The enumeration of all 4 x 4
|
|
predecoded values are used by the final decode to produce the 16
|
|
decoded results. As an example, Table~\ref{table:4-16 hierarchical_decoder}
|
|
gives the detailed input and output siganls for the 4:16 hierarchical
|
|
decoder.
|
|
|
|
|
|
\begin{table}[h!]
|
|
\begin{center}
|
|
\begin{tabular}{| c | c | c | c |}
|
|
\hline
|
|
A[3:0] & predecoder1 & predecoder2 & Selected WL\\ \hline
|
|
0000 & 1000 & 1000 & 0\\ \hline
|
|
0001 & 1000 & 0100 & 1\\ \hline
|
|
0010 & 1000 & 0010 & 2\\ \hline
|
|
0011 & 1000 & 0001 & 3\\ \hline
|
|
0100 & 0100 & 1000 & 4\\ \hline
|
|
0101 & 0100 & 0100 & 5\\ \hline
|
|
0110 & 0100 & 0010 & 6\\ \hline
|
|
0111 & 0100 & 0001 & 7\\ \hline
|
|
1000 & 0010 & 1000 & 8\\ \hline
|
|
1001 & 0010 & 0100 & 9\\ \hline
|
|
1010 & 0010 & 0010 & 10\\ \hline
|
|
1011 & 0010 & 0001 & 11\\ \hline
|
|
1100 & 0001 & 1000 & 12\\ \hline
|
|
1101 & 0001 & 0100 & 13\\ \hline
|
|
1110 & 0001 & 0010 & 14\\ \hline
|
|
1111 & 0001 & 0001 & 15\\ \hline
|
|
\end{tabular}
|
|
\end{center}
|
|
\caption{Truth table for 4:16 hierarchical decoder.}
|
|
\label{table:4-16 hierarchical_decoder}
|
|
\end{table}
|
|
|
|
|
|
As the address size increases, additional sizes of pre- and final
|
|
decoders can be used. In OpenRAM, there are implementations for
|
|
\verb|modules/hierarchical\_predecode2x4.py| and
|
|
\verb|modules/hierarchical\_predecode3x8.py| to produce 2:4 and 3:8
|
|
predecodes, respectively. These same decoders are used to generate the
|
|
column mux select bits as well.
|
|
|
|
For the final decode, we can use either pnand2 or pnand3 gates. This
|
|
allows a maximum size of three 3:8 predocers along with a final pnand3 decode
|
|
stage, or, 512 word lines. To extend beyond this, a pnand4 or
|
|
a 4:16 predecoder would be needed.
|
|
|
|
|
|
\subsection{Wordline Driver}
|
|
\label{sec:wldriver}
|
|
|
|
The word line driver buffers the address decoder to drive the wordline and
|
|
gates the signal until the decode has stabilized. Without waiting, an
|
|
incorrectly asserted wordline could erase memory contents.
|
|
The word line driver is sized according to the bitcell array width so
|
|
that wordlines in larger memory arrays can be appropriately driven.
|
|
|
|
% gating for first half decode, second half read/write
|
|
The first half of the clock cycle is used for address decoding in
|
|
OpenRAM. Therefore, the wordline driver is enabled in the second half
|
|
of the clock cycle in OpenRAM. The buffered clock signal drives each
|
|
wordline driver row and is logically ANDed with the decoder output.
|
|
|
|
% bank clock gating for wordline driver
|
|
In multi-bank structures the clock buffer is also anded with the bank
|
|
select signal to prevent the read/writing of an entire bank.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.6]{./figs/wordline_driver.pdf}
|
|
\caption{Diagram of word line driver.}
|
|
\label{fig:wordline_driver}
|
|
\end{figure}
|
|
|
|
Figure~\ref{fig:wordline_driver} illustrates the wordline driver and
|
|
its inputs/outputs. This is implemented in the
|
|
\verb|modules/wordline_driver.py| module and matches the number of
|
|
rows in the bitcell array of a bank.
|
|
|
|
OpenRAM creates the wordline drivers using the parameterized pinv and
|
|
pnand2 classes. This enables the wordline driver to be matched to the
|
|
bitcell height and to sized to drive the wordline load.
|
|
|
|
|
|
|
|
\subsection{Column Mux}
|
|
\label{sec:column_mux}
|
|
The column mux is an optional module in an SRAM bank. Without a column
|
|
mux, the bank is assumed to have a single word in each row. A column
|
|
mux enables more more than one word to be stored in each row and
|
|
read/written individually. The column mux is used for both the read
|
|
and write operations by connecting the bitlines of a bank to
|
|
both the sense amplifier and the write driver.
|
|
|
|
In OpenRAM, the column mux uses the {\bf high address bits} to select
|
|
the appropriate word in each row. If n-bits are used, there are $2^n$
|
|
words in each row. OpenRAM currently allows 2, 4, or 8 words per row,
|
|
but the 8 words are not fully debugged (as of 2/12/18).
|
|
|
|
%% OpenRAM provides several options for column mux, but the default
|
|
%% is a single-level column mux which is sized for optimal speed.
|
|
|
|
%% \subsubsection{Tree\_Decoding Column Mux}
|
|
%% \label{sec:tree_decoding_column_mux}
|
|
|
|
%% The schematic for a 4-1 tree
|
|
%% multiplexer is shown in Figure~\ref{fig:colmux}.
|
|
|
|
%% \begin{figure}[h!]
|
|
%% \centering
|
|
%% \includegraphics[scale=.9]{./figs/tree_column_mux_schem.pdf}
|
|
%% \caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
|
|
%% \label{fig:colmux}
|
|
%% \end{figure}
|
|
|
|
%% \fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
|
|
|
|
%% This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
|
|
%% and outputs. This 4-1 tree mux illustrates the process of choosing
|
|
%% the correct bitlines if there are 4 words per row in the memory array.
|
|
%% Each bitline pair represents a single bit from each word. A binary
|
|
%% reduction pattern, shown in Table~\ref{table:colmux}, is used to
|
|
%% select the appropriate bitlines. As the number of words per row in
|
|
%% the memory array increases, the depth of the column mux grows. The
|
|
%% depth of the column mux is equal to the number of bits in the column
|
|
%% address bus. The 4-1 tree mux has a depth of 2. In level 1, the
|
|
%% least significant bit from the column address bus selects either the
|
|
%% first and second words or the third and fourth words. In level 2, the
|
|
%% most signifant column address bit selects one of the words passed down
|
|
%% from the previous level. Relative to other column mux designs, the
|
|
%% tree mus uses significantly less devices. But, this type of design
|
|
%% can provide poor performance if a large decoder with many levels are
|
|
%% needed. The delay of of a tree mux quadratically increases with each
|
|
%% level. Due to this fact, other types of column
|
|
%% decoders should be considered for larger arrays.
|
|
|
|
%% \begin{table}[h!]
|
|
%% \begin{center}
|
|
%% \begin{tabular}{| c | c | c | c |}
|
|
%% \hline
|
|
%% Selected BL & Inp1 & Inp2 & Binary\\ \hline
|
|
%% BL0 & SEL0\_bar & SEL1\_bar & 00\\ \hline
|
|
%% BL1 & SEL0 & SEL1\_bar & 01\\ \hline
|
|
%% BL2 & SEL0\_bar & SEL1 & 10\\ \hline
|
|
%% BL3 & SEL0 & SEL1 & 11\\
|
|
%% \hline
|
|
%% \end{tabular}
|
|
%% \end{center}
|
|
%% \caption{Binary reduction pattern for 4-1 tree column mux.}
|
|
%% \label{table:colmux}
|
|
%% \end{table}
|
|
|
|
%% In OpenRAM, the tree column mux is a dynamically generated design. The
|
|
%% \verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
|
|
%% and \verb|mux_abar|. The only diffference between these cells is that input
|
|
%% select signal is either hooked up to the \textbf{SEL} or
|
|
%% \textbf{SEL\_bar} signals (see highlighted boxes in
|
|
%% Figure~\ref{fig:colmux}). These cells are initialized the the
|
|
%% \verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
|
|
%% of \verb|ptx| PMOS transistors are added to the design and the necessary
|
|
%% routing is performed using the \verb|add_rect()| function. A horizontal rail
|
|
%% is added in metal2 for both the SEL and Sel\_bar signals. Underneath
|
|
%% those input rails, horizontal straps are added. These straps are used
|
|
%% to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
|
|
%% outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
|
|
%% bottom of the cell so that connections can be made down to the sense
|
|
%% amp. Vertical connectors are also added in metal1 so that the cells
|
|
%% can connect down to other mux cells when the depth of the tree mux is
|
|
%% more than one level.
|
|
|
|
%% The \verb|tree_mux_array| class is used to generate the tree mux.
|
|
%% Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
|
|
%% are tiled row by row. The offset of the cell in a row is determined
|
|
%% by the depth of that row in the tree mux. The pattern used to
|
|
%% determine the offset of the mux cells is
|
|
%% $muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
|
|
%% depth increases, the mux cells become further apart. A separate
|
|
%% ``for'' loop is invoked if the $depth>1$, which extends the
|
|
%% power/ground and select rails across the entire width of the array.
|
|
%% Similarly, if the $depth>1$, spice net names are created for the
|
|
%% intermediate connection made at the various levels. This is necessary
|
|
%% to ensure that a correct spice netlist is generated and that the
|
|
%% input/output pins of the column mux match the pins in the modules that
|
|
%% it is connected to.
|
|
|
|
|
|
\subsubsection{Single-Level Column Mux}
|
|
\label{sec:single_level_column_mux}
|
|
|
|
OpenRAM includes a single-level pass-gate mux implemtation for the
|
|
column mux. A single level of NMOS devices is driven by either the
|
|
input address (and it's complement) or decoded input addresses using a
|
|
2:4 predecoder (Section~\ref{sec:hierdecoder}).
|
|
|
|
Figure~\ref{fig:2t1_single_level_column_mux} shows the schematic of a
|
|
2:1 single-level column mux. In this column mux, the {\bf MSB of the
|
|
address bus} and it's complement drive the pass transistors.
|
|
|
|
Figure~\ref{fig:4t1_single_level_column_mux} shows the schematic of a
|
|
4:1 single-level column mux. The select bits are decoded from the {\bf
|
|
2 MSB of the address bus} using a 2:4 decoder. The 2:4 decoder
|
|
provides one-hot select signals to select one column.
|
|
|
|
In OpenRAM, one mux, single\_level\_mux, is dynamically generated in
|
|
\verb|modules/single_level_column_mux.py| and multiple of these muxes
|
|
are tiled together in \verb|modules/single_level_column_mux_array.py|.
|
|
|
|
single\_level\_mux uses the parameterized ptx (Section~\ref{sec:ptx}
|
|
to generate 2 or 4 NMOS transistors for each the bl and br
|
|
bitlines. Horizontal rails are added for the $sel$ signals. The
|
|
bitlines are automatically pitch-matched to the bitcell array.
|
|
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.5]{./figs/2t1_single_level_column_mux.pdf}
|
|
\caption{Schematic of a 2:1 single level column mux. \fixme{Signals names are wrong.}}
|
|
\label{fig:2t1_single_level_column_mux}
|
|
\end{figure}
|
|
|
|
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.5]{./figs/4t1_single_level_column_mux.pdf}
|
|
\caption{Schematic of a 4:1 single level column mux. \fixme{Signals names are wrong.}}
|
|
\label{fig:4t1_single_level_column_mux}
|
|
\end{figure}
|
|
|
|
|
|
\subsection{Sense Amplifier}
|
|
\label{sec:senseamp}
|
|
The sense amplifier is used to sense the difference between the
|
|
bitline and bitline bar while a read operation is performed.
|
|
The sense amplifier also includes two PMOS transistors for bitline
|
|
isolation to speed-up read operations. The schematic for the sense amp is shown in
|
|
Figure~\ref{fig:sense_amp}.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.8]{./figs/sense_amp_schem.pdf}
|
|
\caption{Schematic of a single sense amplifier cell.}
|
|
\label{fig:sense_amp}
|
|
\end{figure}
|
|
|
|
During address decoding (while the wordline is not asserted), the sense
|
|
amplifier is disabled and the bitlines are precharged to vdd by the
|
|
precharge unit. The two PMOS transistors also connect the bitlines to the sense amplifier.
|
|
|
|
The en signal comes from the control logic (Section~\ref{sec:control})
|
|
including the timing and replica bitline (Section~\ref{sec:RBL}). It
|
|
is only enabled after sufficient swing is seen on the bitlines so that
|
|
the value can be accurately sensed.
|
|
|
|
The sense amplifier is enabled by the en signal, which initiates the
|
|
read operation, and also isolates the sense amplifier from the
|
|
bitlines. This allows the sense amplifier to drive a smaller
|
|
capacitance rather than the whole bitline. At this time, the footer
|
|
transistor is also enabled which allows the sense amplifier to use
|
|
feedback to sense the bitline differential voltage.
|
|
|
|
When the sense amp is enabled, one of the bitlines experiences a
|
|
voltage drop based on the value stored in the memory cell. If a zero
|
|
is stored, the bitline voltage drops. If a one is stored, the bitline
|
|
bar voltage drops. The output signal is then
|
|
taken to a true logic level and latched for output to the data bus.
|
|
|
|
In OpenRAM, the sense amplifier is a libray cell. The associated
|
|
layout and spice netlist can be found in the \verb|gds_lib| and
|
|
\verb|sp_lib| in the technology directory. The sense\_amp class in
|
|
\verb|modules/sense_amp.py| is a single instance of the sense amp
|
|
library cell.
|
|
|
|
|
|
The sense\_amp\_array class in \verb|modules/sense_amp_array.py|
|
|
handles the tiling of the sense amps cells. One sense amp cell is
|
|
needed per data bit and the sense amp cells need to be appropriately
|
|
spaced so that they can hook up to the column mux bitline pairs. The
|
|
spacing is determined based on the number of words per row in the
|
|
memory array.
|
|
|
|
The sense amp is a library cell so that custom
|
|
amplifier designs could be swapped into the memory as needed. The two
|
|
major things that need to be considered while designing the sense
|
|
amplifier cell are the size of the cell and the bitline/input pitches.
|
|
Optimally, the cell should be no wider than the 6T cell so that it
|
|
abuts to the column mux and no extra routing or space is needed.
|
|
Also, the bitline inputs of the sense amp need to line up with the
|
|
outputs of the write driver. In the current version of OpenRAM, the
|
|
write driver is situated under the sense amp, which had bitlines
|
|
spaning the entire height of the cell. In this case, the sense
|
|
amplifier is disabled during a write operation but the bitlines still
|
|
connect the write driver to the column mux without any extra routing.
|
|
|
|
|
|
\subsection{Write Driver}
|
|
\label{sec:writedriver}
|
|
|
|
The write driver is used to drive the input signal into the memory
|
|
cell during a write operation. It can be seen in
|
|
Figure~\ref{fig:write_driver} that the write driver consists of two
|
|
tristate buffers, one inverting and one non-inverting. It takes in a
|
|
data bit, from the data bus, and outputs that value on the bitline,
|
|
and its complement on bitline bar. The bitlines need to be
|
|
complements so that the data value can be correctly stored in the 6T
|
|
cell. Both tristates are enabled by the EN signal.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.8]{./figs/write_driver_schem.pdf}
|
|
\caption{Schematic of a write driver cell, which consists of 2 tristates (non-inverting and inverting) to drive the bitlines.}
|
|
\label{fig:write_driver}
|
|
\end{figure}
|
|
|
|
Currently, in OpenRAM, the write driver is a library cell. The
|
|
associated layout and spice netlist can be found in the \verb|gds_lib| and
|
|
\verb|sp_lib| in the FreePDK45 directory. Similar to the \verb|sense_amp_array|,
|
|
the \verb|write_driver_array| class tiles the write driver cells. One
|
|
driver cell is needed per data bit and Vdd, Gnd, and EN signals must
|
|
be extended to span the entire width of the cell. It is not optimal to
|
|
have the write driver as a library cell because the driver needs to be
|
|
sized based on the capacitance of the bitlines. A large memory array
|
|
needs a stronger driver to drive the data values into the memory
|
|
cells. We are working on creating a parameterized tristate class,
|
|
which will dynamically generate write driver cells of different
|
|
sizes/strengths.
|
|
|
|
\subsection{Flip-Flop Array}
|
|
|
|
In a synchronous SRAM it is necessary to synchronize the inputs and
|
|
outputs with a clock signal by using flip-flops. In FreePDK45 we
|
|
provide a library cell for a simple master-slave flip-flop, see
|
|
schematic in Figure~\ref{fig:ms_flop}. In our library cell we provide
|
|
both Q and Q\_bar as outputs of the flop because inverted signals are
|
|
used in various modules. The \verb|ms_flop| class in \verb|ms_flop.py|
|
|
instatitates a single master-slave flop, and the \verb|ms_flop_array| class
|
|
generates an array of flip-flops. Arrays of flops are necessary for
|
|
the data bus (an array for both the inputs and outputs) as well as the
|
|
address bus (an array for row and column inputs). The \verb|ms_flop_array|
|
|
takes the number of flops and the type of array as inputs. Currently,
|
|
the type of the array must be either ``data\_in'', ``data\_out'',
|
|
``addr\_row'', or ``addr\_col'' verbatim. The array type input is
|
|
used to look up that associated pin names for each of the flop arrays.
|
|
This was implemented very quickly and should be improved in the near
|
|
future...
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.7]{./figs/ms_flop_schem.pdf}
|
|
\caption{Schematic of a master-slave flip-flop provided in FreePDK45 library}
|
|
\label{fig:ms_flop}
|
|
\end{figure}
|
|
|
|
\subsection{Control Logic}
|
|
|
|
The details of the control logic architecture are outlined in
|
|
Section~\ref{sec:control}. The control logic module,
|
|
\verb|control_logic.py|, instantiates a \verb|control_logic| class that arranges
|
|
all of the flip-flops and logic associated with the control signals
|
|
into a single module. Flip-flops are instantiated for each control
|
|
signal input and library NAND and NOR gates are used for the logic. A
|
|
delay chain, of variable length, is also generted using parameterized
|
|
inverters. The associated layouts and spice netlists can be found in
|
|
the \verb|gds_lib| and \verb|sp_lib| in the FreePDK45 directory.
|
|
|
|
\section{Bank and SRAM}
|
|
\label{sec:bank}
|
|
|
|
The overall memory architecture is shown in figure~\ref{fig:bank}.
|
|
As shown in this figure one Bank contains different modules including
|
|
precharge-array which is positioned above the bitcell-array,
|
|
column-mux-array which is located below the bitcell-array,
|
|
sense-amp-array, write-driver-array, data-in-ms-flop-array
|
|
to synchronize the input data with negative edge of the clock,
|
|
tri-gata-array to share the bidirectional data-bus between input
|
|
and output data, hierarchical decoder which is placed on the right side
|
|
of the bitcell-array (predecoder + decoder), wordline-driver which drives
|
|
the wordlines horizontally across the bitcell-array and address-ms-flops
|
|
to synchronize the input address with positive edge of the clock.
|
|
|
|
In bitcell-array each memory cell is mirrored vertically and horizontally inorder to share VDD and GND rails with adjacent cells and form the array.
|
|
Data-bus is connected to tri-gate, address-bus is connected to address-ms-flops and bank-select
|
|
signal will enable the bank when it goes high. To complete the SRAM design, bank is connected to control-logic as shown in figure~\ref{fig:bank}.
|
|
Control-logic controls the timing
|
|
of modules inside the bank. CSb, OEb, Web and clk are inputs to the control logic and output of
|
|
control logic will ANDed with bank-select signal and send to the corresponding modules.
|
|
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=1]{./figs/bank.pdf}
|
|
\caption{Overal bank and SRAM architecture.}
|
|
\label{fig:bank}
|
|
\end{figure}
|
|
|
|
|
|
In order to reduce the delay and power, divided wordline strategy have been used in this compiler. Part of the address bits
|
|
are used to define the global wordline (bank-select) and rest of address bits are connected to hierarchical
|
|
decoder inside each bank to generate local wordlines that actually drive the bitcell access transistors.
|
|
|
|
As shown in figure~\ref{fig:bank2} SRAM is divided to two banks which share data-bus, address-bus, control-bus and control-logic.
|
|
In this case one bit of address (most significant bit) goes to an ms-flop and outputs of ms-flop (address-out and address-out-bar)
|
|
are connected to banks as bank-select signals. Control logic is shared between two banks and based on which bank is selected,
|
|
control signals will activate modules inside the selected bank. In this architecture, the total cell capacitance is reduced by up
|
|
to a factor of two. Therefore the power will be reduced greatly and the delay among the wordlines is also reduced.
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.9]{./figs/bank2.pdf}
|
|
\caption{SRAM is divided to two banks which share the control-logic.}
|
|
\label{fig:bank2}
|
|
\end{figure}
|
|
|
|
In figure~\ref{fig:bank4}, four banks are connected together. In this case a 2:4 decoder is added to select one of the banks using two
|
|
most significant bits of input address. Control signals are connected to all banks but will turn on only the selected bank.
|
|
|
|
|
|
\begin{figure}[h!]
|
|
\centering
|
|
\includegraphics[scale=.9]{./figs/bank4.pdf}
|
|
\caption{SRAM is divided to 4 banks wich are controlled by the control-logic and a 2:4 decoder.}
|
|
\label{fig:bank4}
|
|
\end{figure}
|
|
|
|
|