mirror of https://github.com/VLSIDA/OpenRAM.git
Update bitcell and array section.
This commit is contained in:
parent
d2ed35526a
commit
21967fccde
249
docs/modules.tex
249
docs/modules.tex
|
|
@ -34,9 +34,12 @@ to aid with placement.
|
|||
\subsection{The Bitcell and Bitcell Array}
|
||||
\label{sec:bitcellarray}
|
||||
|
||||
The 6T cell is the most commonly used memory cell in SRAM devices. It
|
||||
is named a 6T cell because it consist of 6 transistors: 2 access
|
||||
transistors and 2 cross coupled inverters as shown in
|
||||
OpenRAM can work with any cell as the bitcell. This could be a foundry
|
||||
created one or a user design rule cell for experiments. In addition,
|
||||
it could be a common 6T cell or it could be replaced with an 8T, 10T
|
||||
or other cell, depending on needs.
|
||||
|
||||
By default, OpenRAM uses a standard 6T cell as shown in
|
||||
Figure~\ref{fig:6t_cell}. The cross coupled inverters hold a single
|
||||
data bit that can either be driven into, or read from the cell by the
|
||||
bitlines. The access transistors are used to isolate the cell from
|
||||
|
|
@ -45,70 +48,52 @@ accessed.
|
|||
|
||||
\begin{figure}[h!]
|
||||
\centering
|
||||
\includegraphics[scale=.9]{./figs/cell_6t_schem.pdf}
|
||||
\caption{Schematic of 6T cell.}
|
||||
\includegraphics[scale=.9]{figs/cell_6t_schem.pdf}
|
||||
\caption{Standard 6T cell.}
|
||||
\label{fig:6t_cell}
|
||||
\end{figure}
|
||||
|
||||
% memory cell operation
|
||||
The 6T cell can be accessed to perform the two main operation
|
||||
associated with memory: reading and writing. When a read is to be
|
||||
performed, both bitlines are precharged to VDD. This precharging is
|
||||
done during the first half of the read cycle and is handled by the
|
||||
precharge circuitry. In the second half of the read cycle the
|
||||
wordline is asserted, which enable the access transistors. If a 1 is
|
||||
stored in the cell then BLB is discharged to Gnd and BL is pulled up
|
||||
to Vdd. Conversely, if the value stored is a 0, then BL is discharged
|
||||
to Gnd and BLB is pulled up to Vdd. While performing a write
|
||||
operation, both bitlines are also precharged to Vdd during the first
|
||||
half of the write cycle. Again, the world line is asserted, and the
|
||||
access transistors are enabled. The value that is to be written into
|
||||
the cell is applied to BL, and its complement is applied to BLB. The
|
||||
drivers that are applying the signals to the bitlines must be
|
||||
appropriately sized so that the previous value in the cell can be
|
||||
overwritten.
|
||||
|
||||
% tiling memory cells
|
||||
The 6T cells are tiled together in both the horizontal and vertical
|
||||
directions to make up the memory array. The size of the memory array
|
||||
is directly related to the numbers of words, and the size of those
|
||||
words, that will need to be stored in the RAM. For example, an 8kb
|
||||
memory with a word size of 8 bits could be implemented as 8 columns
|
||||
and 1024 rows.
|
||||
directions to make up the memory array.
|
||||
|
||||
% keeping it square
|
||||
It is common practice to keep the aspect ratio of memory array as
|
||||
square as possible\footnote{Future versions will consider optimizing
|
||||
delay and/or power as well.}. This helps to make sure that the
|
||||
bitlines do not become too long, which can increase the bitline
|
||||
capacitance, slow down the operation and lead to more leakage. To
|
||||
make the design ``more square'', multiple words can share rows by
|
||||
interleaving the bits of each word. If the previous 8kb memory was
|
||||
rearranged to allow 2 words per row, then the array would have 16
|
||||
columns and 512 rows.
|
||||
It is common practice to keep the aspect ratio of a memory array
|
||||
roughly ``square'' to ensure that the bitlines and wordlines do not
|
||||
become too long. If the bitlines are too long, this can increase the
|
||||
bitline capacitance, slow down the operation and lead to bitline
|
||||
leakage problems. To make an array ``more square'', multiple words
|
||||
can share rows by interleaving the bits of each word. The column mux
|
||||
in Section~\ref{sec:column_mux} is responsbile for selecting a subset
|
||||
of bitcells in a row to extract a word during read and write
|
||||
operations.
|
||||
|
||||
% memory cell is a library cell
|
||||
In OpenRAM, we provide a library cell for the 6T cell so that users
|
||||
can easily swap in different memory cell designs. The memory cell is
|
||||
the most important cell in the RAM and should be customized to
|
||||
minimize area and optimize performance. The memory cell is the most
|
||||
replicated cell in the RAM; minimizing its size can have a drastic
|
||||
effext on the overall size of the RAM. Also, the transitors in the cell
|
||||
must be carefully sized to allow for correct read and write operation
|
||||
as well as protection against corruption.
|
||||
In OpenRAM, we provide a library cell for the 6T cell that can be
|
||||
swapped with a fab memory cell, if available. The transitors in the
|
||||
cell are sized appropriately considering read and write noise margins.
|
||||
|
||||
% bitcell and bitcell_array classes
|
||||
The \verb|bitcell| class in \verb|bitcell.py| instantiates a single
|
||||
memory cell and is usually a pre-made library cell. The
|
||||
\verb|bitcell_array| class in \verb|bitcell_array.py| dynamically
|
||||
implements the memory cell array by instantiating a single memory cell
|
||||
according to the number of rows and columns. During the tiling
|
||||
process, the cells are abutted so that all bitlines and word lines are
|
||||
connected in the vertical and horizontal directions respectively. In
|
||||
order to share supply rails, cells are flipped in alternating rows. To
|
||||
avoid any extra routing, the power/ground rails, bitlines, and
|
||||
wordlines should span the entire width/height of the cell so thay they
|
||||
are automatically connected when the cells are abutted.
|
||||
The bitcell class in \verb|modules/bitcell.py| is a single
|
||||
memory cell and is usually a pre-made library cell.
|
||||
|
||||
% bitcell_array
|
||||
The bitcell\_array class in \verb|modules/bitcell_array.py| dynamically
|
||||
implements the memory cell array by instantiating a the bitcell class
|
||||
in rows and columns.
|
||||
|
||||
% abutment connections
|
||||
During the tiling process, bitcells are abutted so that all bitlines
|
||||
and word lines are connected in the vertical and horizontal directions
|
||||
respectively. This is done by using the boundary layer to define the
|
||||
height and width of the cell. If this is not specified, OpenRAM will
|
||||
use the bounding box of all shapes as the boundary. The boundary layer
|
||||
should be offset at (0,0) in the lower left coordinate.
|
||||
|
||||
% flipping
|
||||
In order to share supply rails, bitcells are flipped in alternating
|
||||
rows.
|
||||
|
||||
|
||||
|
||||
\subsection{Precharge Circuitry}
|
||||
|
|
@ -271,7 +256,7 @@ takes the transistor size and cell height as inputs (so that it can abutt the
|
|||
|
||||
|
||||
\subsection{Column Mux}
|
||||
|
||||
\label{sec:column_mux}
|
||||
The column mux takes the column address bits from the address bus
|
||||
selects the appropriate bitlines for the word that is to be read from
|
||||
or written to. It takes n-bits from the address bus and can select
|
||||
|
|
@ -282,88 +267,88 @@ sense ampflifier and the write driver.
|
|||
OpenRAM provides several options for column mux, but the default
|
||||
is a single-level column mux which is sized for optimal speed.
|
||||
|
||||
\subsubsection{Tree\_Decoding Column Mux}
|
||||
\label{sec:tree_decoding_column_mux}
|
||||
%% \subsubsection{Tree\_Decoding Column Mux}
|
||||
%% \label{sec:tree_decoding_column_mux}
|
||||
|
||||
The schematic for a 4-1 tree
|
||||
multiplexer is shown in Figure~\ref{fig:colmux}.
|
||||
%% The schematic for a 4-1 tree
|
||||
%% multiplexer is shown in Figure~\ref{fig:colmux}.
|
||||
|
||||
\begin{figure}[h!]
|
||||
\centering
|
||||
\includegraphics[scale=.9]{./figs/tree_column_mux_schem.pdf}
|
||||
\caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
|
||||
\label{fig:colmux}
|
||||
\end{figure}
|
||||
%% \begin{figure}[h!]
|
||||
%% \centering
|
||||
%% \includegraphics[scale=.9]{./figs/tree_column_mux_schem.pdf}
|
||||
%% \caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
|
||||
%% \label{fig:colmux}
|
||||
%% \end{figure}
|
||||
|
||||
\fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
|
||||
%% \fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
|
||||
|
||||
This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
|
||||
and outputs. This 4-1 tree mux illustrates the process of choosing
|
||||
the correct bitlines if there are 4 words per row in the memory array.
|
||||
Each bitline pair represents a single bit from each word. A binary
|
||||
reduction pattern, shown in Table~\ref{table:colmux}, is used to
|
||||
select the appropriate bitlines. As the number of words per row in
|
||||
the memory array increases, the depth of the column mux grows. The
|
||||
depth of the column mux is equal to the number of bits in the column
|
||||
address bus. The 4-1 tree mux has a depth of 2. In level 1, the
|
||||
least significant bit from the column address bus selects either the
|
||||
first and second words or the third and fourth words. In level 2, the
|
||||
most signifant column address bit selects one of the words passed down
|
||||
from the previous level. Relative to other column mux designs, the
|
||||
tree mus uses significantly less devices. But, this type of design
|
||||
can provide poor performance if a large decoder with many levels are
|
||||
needed. The delay of of a tree mux quadratically increases with each
|
||||
level. Due to this fact, other types of column
|
||||
decoders should be considered for larger arrays.
|
||||
%% This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
|
||||
%% and outputs. This 4-1 tree mux illustrates the process of choosing
|
||||
%% the correct bitlines if there are 4 words per row in the memory array.
|
||||
%% Each bitline pair represents a single bit from each word. A binary
|
||||
%% reduction pattern, shown in Table~\ref{table:colmux}, is used to
|
||||
%% select the appropriate bitlines. As the number of words per row in
|
||||
%% the memory array increases, the depth of the column mux grows. The
|
||||
%% depth of the column mux is equal to the number of bits in the column
|
||||
%% address bus. The 4-1 tree mux has a depth of 2. In level 1, the
|
||||
%% least significant bit from the column address bus selects either the
|
||||
%% first and second words or the third and fourth words. In level 2, the
|
||||
%% most signifant column address bit selects one of the words passed down
|
||||
%% from the previous level. Relative to other column mux designs, the
|
||||
%% tree mus uses significantly less devices. But, this type of design
|
||||
%% can provide poor performance if a large decoder with many levels are
|
||||
%% needed. The delay of of a tree mux quadratically increases with each
|
||||
%% level. Due to this fact, other types of column
|
||||
%% decoders should be considered for larger arrays.
|
||||
|
||||
\begin{table}[h!]
|
||||
\begin{center}
|
||||
\begin{tabular}{| c | c | c | c |}
|
||||
\hline
|
||||
Selected BL & Inp1 & Inp2 & Binary\\ \hline
|
||||
BL0 & SEL0\_bar & SEL1\_bar & 00\\ \hline
|
||||
BL1 & SEL0 & SEL1\_bar & 01\\ \hline
|
||||
BL2 & SEL0\_bar & SEL1 & 10\\ \hline
|
||||
BL3 & SEL0 & SEL1 & 11\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\caption{Binary reduction pattern for 4-1 tree column mux.}
|
||||
\label{table:colmux}
|
||||
\end{table}
|
||||
%% \begin{table}[h!]
|
||||
%% \begin{center}
|
||||
%% \begin{tabular}{| c | c | c | c |}
|
||||
%% \hline
|
||||
%% Selected BL & Inp1 & Inp2 & Binary\\ \hline
|
||||
%% BL0 & SEL0\_bar & SEL1\_bar & 00\\ \hline
|
||||
%% BL1 & SEL0 & SEL1\_bar & 01\\ \hline
|
||||
%% BL2 & SEL0\_bar & SEL1 & 10\\ \hline
|
||||
%% BL3 & SEL0 & SEL1 & 11\\
|
||||
%% \hline
|
||||
%% \end{tabular}
|
||||
%% \end{center}
|
||||
%% \caption{Binary reduction pattern for 4-1 tree column mux.}
|
||||
%% \label{table:colmux}
|
||||
%% \end{table}
|
||||
|
||||
In OpenRAM, the tree column mux is a dynamically generated design. The
|
||||
\verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
|
||||
and \verb|mux_abar|. The only diffference between these cells is that input
|
||||
select signal is either hooked up to the \textbf{SEL} or
|
||||
\textbf{SEL\_bar} signals (see highlighted boxes in
|
||||
Figure~\ref{fig:colmux}). These cells are initialized the the
|
||||
\verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
|
||||
of \verb|ptx| PMOS transistors are added to the design and the necessary
|
||||
routing is performed using the \verb|add_rect()| function. A horizontal rail
|
||||
is added in metal2 for both the SEL and Sel\_bar signals. Underneath
|
||||
those input rails, horizontal straps are added. These straps are used
|
||||
to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
|
||||
outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
|
||||
bottom of the cell so that connections can be made down to the sense
|
||||
amp. Vertical connectors are also added in metal1 so that the cells
|
||||
can connect down to other mux cells when the depth of the tree mux is
|
||||
more than one level.
|
||||
%% In OpenRAM, the tree column mux is a dynamically generated design. The
|
||||
%% \verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
|
||||
%% and \verb|mux_abar|. The only diffference between these cells is that input
|
||||
%% select signal is either hooked up to the \textbf{SEL} or
|
||||
%% \textbf{SEL\_bar} signals (see highlighted boxes in
|
||||
%% Figure~\ref{fig:colmux}). These cells are initialized the the
|
||||
%% \verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
|
||||
%% of \verb|ptx| PMOS transistors are added to the design and the necessary
|
||||
%% routing is performed using the \verb|add_rect()| function. A horizontal rail
|
||||
%% is added in metal2 for both the SEL and Sel\_bar signals. Underneath
|
||||
%% those input rails, horizontal straps are added. These straps are used
|
||||
%% to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
|
||||
%% outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
|
||||
%% bottom of the cell so that connections can be made down to the sense
|
||||
%% amp. Vertical connectors are also added in metal1 so that the cells
|
||||
%% can connect down to other mux cells when the depth of the tree mux is
|
||||
%% more than one level.
|
||||
|
||||
The \verb|tree_mux_array| class is used to generate the tree mux.
|
||||
Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
|
||||
are tiled row by row. The offset of the cell in a row is determined
|
||||
by the depth of that row in the tree mux. The pattern used to
|
||||
determine the offset of the mux cells is
|
||||
$muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
|
||||
depth increases, the mux cells become further apart. A separate
|
||||
``for'' loop is invoked if the $depth>1$, which extends the
|
||||
power/ground and select rails across the entire width of the array.
|
||||
Similarly, if the $depth>1$, spice net names are created for the
|
||||
intermediate connection made at the various levels. This is necessary
|
||||
to ensure that a correct spice netlist is generated and that the
|
||||
input/output pins of the column mux match the pins in the modules that
|
||||
it is connected to.
|
||||
%% The \verb|tree_mux_array| class is used to generate the tree mux.
|
||||
%% Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
|
||||
%% are tiled row by row. The offset of the cell in a row is determined
|
||||
%% by the depth of that row in the tree mux. The pattern used to
|
||||
%% determine the offset of the mux cells is
|
||||
%% $muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
|
||||
%% depth increases, the mux cells become further apart. A separate
|
||||
%% ``for'' loop is invoked if the $depth>1$, which extends the
|
||||
%% power/ground and select rails across the entire width of the array.
|
||||
%% Similarly, if the $depth>1$, spice net names are created for the
|
||||
%% intermediate connection made at the various levels. This is necessary
|
||||
%% to ensure that a correct spice netlist is generated and that the
|
||||
%% input/output pins of the column mux match the pins in the modules that
|
||||
%% it is connected to.
|
||||
|
||||
|
||||
\subsubsection{Single\_Level Column Mux}
|
||||
|
|
|
|||
Binary file not shown.
Loading…
Reference in New Issue