mirror of https://github.com/VLSIDA/OpenRAM.git
Update bitcell and array section.
This commit is contained in:
parent
d2ed35526a
commit
21967fccde
249
docs/modules.tex
249
docs/modules.tex
|
|
@ -34,9 +34,12 @@ to aid with placement.
|
||||||
\subsection{The Bitcell and Bitcell Array}
|
\subsection{The Bitcell and Bitcell Array}
|
||||||
\label{sec:bitcellarray}
|
\label{sec:bitcellarray}
|
||||||
|
|
||||||
The 6T cell is the most commonly used memory cell in SRAM devices. It
|
OpenRAM can work with any cell as the bitcell. This could be a foundry
|
||||||
is named a 6T cell because it consist of 6 transistors: 2 access
|
created one or a user design rule cell for experiments. In addition,
|
||||||
transistors and 2 cross coupled inverters as shown in
|
it could be a common 6T cell or it could be replaced with an 8T, 10T
|
||||||
|
or other cell, depending on needs.
|
||||||
|
|
||||||
|
By default, OpenRAM uses a standard 6T cell as shown in
|
||||||
Figure~\ref{fig:6t_cell}. The cross coupled inverters hold a single
|
Figure~\ref{fig:6t_cell}. The cross coupled inverters hold a single
|
||||||
data bit that can either be driven into, or read from the cell by the
|
data bit that can either be driven into, or read from the cell by the
|
||||||
bitlines. The access transistors are used to isolate the cell from
|
bitlines. The access transistors are used to isolate the cell from
|
||||||
|
|
@ -45,70 +48,52 @@ accessed.
|
||||||
|
|
||||||
\begin{figure}[h!]
|
\begin{figure}[h!]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[scale=.9]{./figs/cell_6t_schem.pdf}
|
\includegraphics[scale=.9]{figs/cell_6t_schem.pdf}
|
||||||
\caption{Schematic of 6T cell.}
|
\caption{Standard 6T cell.}
|
||||||
\label{fig:6t_cell}
|
\label{fig:6t_cell}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
% memory cell operation
|
|
||||||
The 6T cell can be accessed to perform the two main operation
|
|
||||||
associated with memory: reading and writing. When a read is to be
|
|
||||||
performed, both bitlines are precharged to VDD. This precharging is
|
|
||||||
done during the first half of the read cycle and is handled by the
|
|
||||||
precharge circuitry. In the second half of the read cycle the
|
|
||||||
wordline is asserted, which enable the access transistors. If a 1 is
|
|
||||||
stored in the cell then BLB is discharged to Gnd and BL is pulled up
|
|
||||||
to Vdd. Conversely, if the value stored is a 0, then BL is discharged
|
|
||||||
to Gnd and BLB is pulled up to Vdd. While performing a write
|
|
||||||
operation, both bitlines are also precharged to Vdd during the first
|
|
||||||
half of the write cycle. Again, the world line is asserted, and the
|
|
||||||
access transistors are enabled. The value that is to be written into
|
|
||||||
the cell is applied to BL, and its complement is applied to BLB. The
|
|
||||||
drivers that are applying the signals to the bitlines must be
|
|
||||||
appropriately sized so that the previous value in the cell can be
|
|
||||||
overwritten.
|
|
||||||
|
|
||||||
% tiling memory cells
|
% tiling memory cells
|
||||||
The 6T cells are tiled together in both the horizontal and vertical
|
The 6T cells are tiled together in both the horizontal and vertical
|
||||||
directions to make up the memory array. The size of the memory array
|
directions to make up the memory array.
|
||||||
is directly related to the numbers of words, and the size of those
|
|
||||||
words, that will need to be stored in the RAM. For example, an 8kb
|
|
||||||
memory with a word size of 8 bits could be implemented as 8 columns
|
|
||||||
and 1024 rows.
|
|
||||||
|
|
||||||
% keeping it square
|
% keeping it square
|
||||||
It is common practice to keep the aspect ratio of memory array as
|
It is common practice to keep the aspect ratio of a memory array
|
||||||
square as possible\footnote{Future versions will consider optimizing
|
roughly ``square'' to ensure that the bitlines and wordlines do not
|
||||||
delay and/or power as well.}. This helps to make sure that the
|
become too long. If the bitlines are too long, this can increase the
|
||||||
bitlines do not become too long, which can increase the bitline
|
bitline capacitance, slow down the operation and lead to bitline
|
||||||
capacitance, slow down the operation and lead to more leakage. To
|
leakage problems. To make an array ``more square'', multiple words
|
||||||
make the design ``more square'', multiple words can share rows by
|
can share rows by interleaving the bits of each word. The column mux
|
||||||
interleaving the bits of each word. If the previous 8kb memory was
|
in Section~\ref{sec:column_mux} is responsbile for selecting a subset
|
||||||
rearranged to allow 2 words per row, then the array would have 16
|
of bitcells in a row to extract a word during read and write
|
||||||
columns and 512 rows.
|
operations.
|
||||||
|
|
||||||
% memory cell is a library cell
|
% memory cell is a library cell
|
||||||
In OpenRAM, we provide a library cell for the 6T cell so that users
|
In OpenRAM, we provide a library cell for the 6T cell that can be
|
||||||
can easily swap in different memory cell designs. The memory cell is
|
swapped with a fab memory cell, if available. The transitors in the
|
||||||
the most important cell in the RAM and should be customized to
|
cell are sized appropriately considering read and write noise margins.
|
||||||
minimize area and optimize performance. The memory cell is the most
|
|
||||||
replicated cell in the RAM; minimizing its size can have a drastic
|
|
||||||
effext on the overall size of the RAM. Also, the transitors in the cell
|
|
||||||
must be carefully sized to allow for correct read and write operation
|
|
||||||
as well as protection against corruption.
|
|
||||||
|
|
||||||
% bitcell and bitcell_array classes
|
% bitcell and bitcell_array classes
|
||||||
The \verb|bitcell| class in \verb|bitcell.py| instantiates a single
|
The bitcell class in \verb|modules/bitcell.py| is a single
|
||||||
memory cell and is usually a pre-made library cell. The
|
memory cell and is usually a pre-made library cell.
|
||||||
\verb|bitcell_array| class in \verb|bitcell_array.py| dynamically
|
|
||||||
implements the memory cell array by instantiating a single memory cell
|
% bitcell_array
|
||||||
according to the number of rows and columns. During the tiling
|
The bitcell\_array class in \verb|modules/bitcell_array.py| dynamically
|
||||||
process, the cells are abutted so that all bitlines and word lines are
|
implements the memory cell array by instantiating a the bitcell class
|
||||||
connected in the vertical and horizontal directions respectively. In
|
in rows and columns.
|
||||||
order to share supply rails, cells are flipped in alternating rows. To
|
|
||||||
avoid any extra routing, the power/ground rails, bitlines, and
|
% abutment connections
|
||||||
wordlines should span the entire width/height of the cell so thay they
|
During the tiling process, bitcells are abutted so that all bitlines
|
||||||
are automatically connected when the cells are abutted.
|
and word lines are connected in the vertical and horizontal directions
|
||||||
|
respectively. This is done by using the boundary layer to define the
|
||||||
|
height and width of the cell. If this is not specified, OpenRAM will
|
||||||
|
use the bounding box of all shapes as the boundary. The boundary layer
|
||||||
|
should be offset at (0,0) in the lower left coordinate.
|
||||||
|
|
||||||
|
% flipping
|
||||||
|
In order to share supply rails, bitcells are flipped in alternating
|
||||||
|
rows.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Precharge Circuitry}
|
\subsection{Precharge Circuitry}
|
||||||
|
|
@ -271,7 +256,7 @@ takes the transistor size and cell height as inputs (so that it can abutt the
|
||||||
|
|
||||||
|
|
||||||
\subsection{Column Mux}
|
\subsection{Column Mux}
|
||||||
|
\label{sec:column_mux}
|
||||||
The column mux takes the column address bits from the address bus
|
The column mux takes the column address bits from the address bus
|
||||||
selects the appropriate bitlines for the word that is to be read from
|
selects the appropriate bitlines for the word that is to be read from
|
||||||
or written to. It takes n-bits from the address bus and can select
|
or written to. It takes n-bits from the address bus and can select
|
||||||
|
|
@ -282,88 +267,88 @@ sense ampflifier and the write driver.
|
||||||
OpenRAM provides several options for column mux, but the default
|
OpenRAM provides several options for column mux, but the default
|
||||||
is a single-level column mux which is sized for optimal speed.
|
is a single-level column mux which is sized for optimal speed.
|
||||||
|
|
||||||
\subsubsection{Tree\_Decoding Column Mux}
|
%% \subsubsection{Tree\_Decoding Column Mux}
|
||||||
\label{sec:tree_decoding_column_mux}
|
%% \label{sec:tree_decoding_column_mux}
|
||||||
|
|
||||||
The schematic for a 4-1 tree
|
%% The schematic for a 4-1 tree
|
||||||
multiplexer is shown in Figure~\ref{fig:colmux}.
|
%% multiplexer is shown in Figure~\ref{fig:colmux}.
|
||||||
|
|
||||||
\begin{figure}[h!]
|
%% \begin{figure}[h!]
|
||||||
\centering
|
%% \centering
|
||||||
\includegraphics[scale=.9]{./figs/tree_column_mux_schem.pdf}
|
%% \includegraphics[scale=.9]{./figs/tree_column_mux_schem.pdf}
|
||||||
\caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
|
%% \caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
|
||||||
\label{fig:colmux}
|
%% \label{fig:colmux}
|
||||||
\end{figure}
|
%% \end{figure}
|
||||||
|
|
||||||
\fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
|
%% \fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
|
||||||
|
|
||||||
This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
|
%% This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
|
||||||
and outputs. This 4-1 tree mux illustrates the process of choosing
|
%% and outputs. This 4-1 tree mux illustrates the process of choosing
|
||||||
the correct bitlines if there are 4 words per row in the memory array.
|
%% the correct bitlines if there are 4 words per row in the memory array.
|
||||||
Each bitline pair represents a single bit from each word. A binary
|
%% Each bitline pair represents a single bit from each word. A binary
|
||||||
reduction pattern, shown in Table~\ref{table:colmux}, is used to
|
%% reduction pattern, shown in Table~\ref{table:colmux}, is used to
|
||||||
select the appropriate bitlines. As the number of words per row in
|
%% select the appropriate bitlines. As the number of words per row in
|
||||||
the memory array increases, the depth of the column mux grows. The
|
%% the memory array increases, the depth of the column mux grows. The
|
||||||
depth of the column mux is equal to the number of bits in the column
|
%% depth of the column mux is equal to the number of bits in the column
|
||||||
address bus. The 4-1 tree mux has a depth of 2. In level 1, the
|
%% address bus. The 4-1 tree mux has a depth of 2. In level 1, the
|
||||||
least significant bit from the column address bus selects either the
|
%% least significant bit from the column address bus selects either the
|
||||||
first and second words or the third and fourth words. In level 2, the
|
%% first and second words or the third and fourth words. In level 2, the
|
||||||
most signifant column address bit selects one of the words passed down
|
%% most signifant column address bit selects one of the words passed down
|
||||||
from the previous level. Relative to other column mux designs, the
|
%% from the previous level. Relative to other column mux designs, the
|
||||||
tree mus uses significantly less devices. But, this type of design
|
%% tree mus uses significantly less devices. But, this type of design
|
||||||
can provide poor performance if a large decoder with many levels are
|
%% can provide poor performance if a large decoder with many levels are
|
||||||
needed. The delay of of a tree mux quadratically increases with each
|
%% needed. The delay of of a tree mux quadratically increases with each
|
||||||
level. Due to this fact, other types of column
|
%% level. Due to this fact, other types of column
|
||||||
decoders should be considered for larger arrays.
|
%% decoders should be considered for larger arrays.
|
||||||
|
|
||||||
\begin{table}[h!]
|
%% \begin{table}[h!]
|
||||||
\begin{center}
|
%% \begin{center}
|
||||||
\begin{tabular}{| c | c | c | c |}
|
%% \begin{tabular}{| c | c | c | c |}
|
||||||
\hline
|
%% \hline
|
||||||
Selected BL & Inp1 & Inp2 & Binary\\ \hline
|
%% Selected BL & Inp1 & Inp2 & Binary\\ \hline
|
||||||
BL0 & SEL0\_bar & SEL1\_bar & 00\\ \hline
|
%% BL0 & SEL0\_bar & SEL1\_bar & 00\\ \hline
|
||||||
BL1 & SEL0 & SEL1\_bar & 01\\ \hline
|
%% BL1 & SEL0 & SEL1\_bar & 01\\ \hline
|
||||||
BL2 & SEL0\_bar & SEL1 & 10\\ \hline
|
%% BL2 & SEL0\_bar & SEL1 & 10\\ \hline
|
||||||
BL3 & SEL0 & SEL1 & 11\\
|
%% BL3 & SEL0 & SEL1 & 11\\
|
||||||
\hline
|
%% \hline
|
||||||
\end{tabular}
|
%% \end{tabular}
|
||||||
\end{center}
|
%% \end{center}
|
||||||
\caption{Binary reduction pattern for 4-1 tree column mux.}
|
%% \caption{Binary reduction pattern for 4-1 tree column mux.}
|
||||||
\label{table:colmux}
|
%% \label{table:colmux}
|
||||||
\end{table}
|
%% \end{table}
|
||||||
|
|
||||||
In OpenRAM, the tree column mux is a dynamically generated design. The
|
%% In OpenRAM, the tree column mux is a dynamically generated design. The
|
||||||
\verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
|
%% \verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
|
||||||
and \verb|mux_abar|. The only diffference between these cells is that input
|
%% and \verb|mux_abar|. The only diffference between these cells is that input
|
||||||
select signal is either hooked up to the \textbf{SEL} or
|
%% select signal is either hooked up to the \textbf{SEL} or
|
||||||
\textbf{SEL\_bar} signals (see highlighted boxes in
|
%% \textbf{SEL\_bar} signals (see highlighted boxes in
|
||||||
Figure~\ref{fig:colmux}). These cells are initialized the the
|
%% Figure~\ref{fig:colmux}). These cells are initialized the the
|
||||||
\verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
|
%% \verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
|
||||||
of \verb|ptx| PMOS transistors are added to the design and the necessary
|
%% of \verb|ptx| PMOS transistors are added to the design and the necessary
|
||||||
routing is performed using the \verb|add_rect()| function. A horizontal rail
|
%% routing is performed using the \verb|add_rect()| function. A horizontal rail
|
||||||
is added in metal2 for both the SEL and Sel\_bar signals. Underneath
|
%% is added in metal2 for both the SEL and Sel\_bar signals. Underneath
|
||||||
those input rails, horizontal straps are added. These straps are used
|
%% those input rails, horizontal straps are added. These straps are used
|
||||||
to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
|
%% to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
|
||||||
outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
|
%% outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
|
||||||
bottom of the cell so that connections can be made down to the sense
|
%% bottom of the cell so that connections can be made down to the sense
|
||||||
amp. Vertical connectors are also added in metal1 so that the cells
|
%% amp. Vertical connectors are also added in metal1 so that the cells
|
||||||
can connect down to other mux cells when the depth of the tree mux is
|
%% can connect down to other mux cells when the depth of the tree mux is
|
||||||
more than one level.
|
%% more than one level.
|
||||||
|
|
||||||
The \verb|tree_mux_array| class is used to generate the tree mux.
|
%% The \verb|tree_mux_array| class is used to generate the tree mux.
|
||||||
Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
|
%% Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
|
||||||
are tiled row by row. The offset of the cell in a row is determined
|
%% are tiled row by row. The offset of the cell in a row is determined
|
||||||
by the depth of that row in the tree mux. The pattern used to
|
%% by the depth of that row in the tree mux. The pattern used to
|
||||||
determine the offset of the mux cells is
|
%% determine the offset of the mux cells is
|
||||||
$muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
|
%% $muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
|
||||||
depth increases, the mux cells become further apart. A separate
|
%% depth increases, the mux cells become further apart. A separate
|
||||||
``for'' loop is invoked if the $depth>1$, which extends the
|
%% ``for'' loop is invoked if the $depth>1$, which extends the
|
||||||
power/ground and select rails across the entire width of the array.
|
%% power/ground and select rails across the entire width of the array.
|
||||||
Similarly, if the $depth>1$, spice net names are created for the
|
%% Similarly, if the $depth>1$, spice net names are created for the
|
||||||
intermediate connection made at the various levels. This is necessary
|
%% intermediate connection made at the various levels. This is necessary
|
||||||
to ensure that a correct spice netlist is generated and that the
|
%% to ensure that a correct spice netlist is generated and that the
|
||||||
input/output pins of the column mux match the pins in the modules that
|
%% input/output pins of the column mux match the pins in the modules that
|
||||||
it is connected to.
|
%% it is connected to.
|
||||||
|
|
||||||
|
|
||||||
\subsubsection{Single\_Level Column Mux}
|
\subsubsection{Single\_Level Column Mux}
|
||||||
|
|
|
||||||
Binary file not shown.
Loading…
Reference in New Issue