\caption{Schematic of 4 to 16 hierarchical decoder.}
\label{fig:4 to 16 decoder}
\end{figure}
Figure~\ref{fig:4 to 16 decoder} shows the 4 to 16 heirarchical decoder. The structure of the decoder consists of two 2:4 decoders for predecoding and 2-input nand gates and inverters for final decoding to form the 4:16 decoder.
In the predecoder, a total of 8 intermediate signals are generated from the address bits and their complements.
The concept of using predecoing and final decoding stage for construction of address decoder is very procutive since small
decoders like 2:4 decoder is used for predecoding. The operation of 4:16 heirarchical decoder can explained with an example. If the address is A0A1A2A3=0000 the output of the predecoder1 and predeocder2 will be
WL0WL1WL2WL3=1000 and WL0WL1WL2WL3=1000, respectively. According to the connections in figure~\ref{fig:4 to 16 decoder} the wordline 0 of predecoder1 and predecoder2 are conneted
to the first 2-input nand gate in the decode stage representing the wordline 0 of the final decoding stage. Hence depengin on the combination
of the input signal one of the wordline will rise. In this case since the address input is A0A1A2A3=0000 the wordline 0 should go high. Table~\ref{table:4-16 hierarchical_decoder} gives the detailed input and output siganls
\caption{Truth table for 4:16 hierarchical decoder.}
\label{table:4-16 hierarchical_decoder}
\end{table}
As the size of the address line increases higher level decoder can be created using the lower level decoders. For example for a 8:256 decoder, two instances of 4:16 followed by 256 2-input nand gates and inverters
can form the decoder. In order to construct the 8:256 decoder, first 4:16 decoder should be constructed through using 2:4 deccoders. Hence the name is hierarchical decoder.
\subsection{Wordline Driver}
\label{sec:wldriver}
Word line drivers are inserted, in between the word line
output of the address decoder and the word line input of the bitcell-array. The word
line drivers ensure that as the size of the memory array increases,
and the word line length and capacitance increases, the word line
signal is able to turn on the access transistors in the 6T cell. Also, as the bank select signal
in multi-bank structures is $ANDED$ with the word line output of decoder,
bitcells turn on only when bank is selected.
Figure~\ref{fig:wordline_driver} shows the diagram of word line driver and its input/output pins.
In OpenRAM, word line drivers are created by using the \verb|pinv| and \verb|nand2| classes which
takes the transistor size and cell height as inputs (so that it can abutt the
6T cell). Word line driver is added as seperate module in \verb|compiler|.
\caption{Schematic of 4-1 tree column mux that passes both of the bitlines.}
\label{fig:colmux}
\end{figure}
\fixme{Shading/opacity is different on different platforms. Make this a box in the image. It doesn't work on OSX.}
This tree mux selects pairs of bitlines (both BL and BL\_B) as inputs
and outputs. This 4-1 tree mux illustrates the process of choosing
the correct bitlines if there are 4 words per row in the memory array.
Each bitline pair represents a single bit from each word. A binary
reduction pattern, shown in Table~\ref{table:colmux}, is used to
select the appropriate bitlines. As the number of words per row in
the memory array increases, the depth of the column mux grows. The
depth of the column mux is equal to the number of bits in the column
address bus. The 4-1 tree mux has a depth of 2. In level 1, the
least significant bit from the column address bus selects either the
first and second words or the third and fourth words. In level 2, the
most signifant column address bit selects one of the words passed down
from the previous level. Relative to other column mux designs, the
tree mus uses significantly less devices. But, this type of design
can provide poor performance if a large decoder with many levels are
needed. The delay of of a tree mux quadratically increases with each
level. Due to this fact, other types of column
decoders should be considered for larger arrays.
\begin{table}[h!]
\begin{center}
\begin{tabular}{| c | c | c | c |}
\hline
Selected BL & Inp1 & Inp2 & Binary\\\hline
BL0 & SEL0\_bar & SEL1\_bar & 00\\\hline
BL1 & SEL0 & SEL1\_bar & 01\\\hline
BL2 & SEL0\_bar & SEL1 & 10\\\hline
BL3 & SEL0 & SEL1 & 11\\
\hline
\end{tabular}
\end{center}
\caption{Binary reduction pattern for 4-1 tree column mux.}
\label{table:colmux}
\end{table}
In OpenRAM, the tree column mux is a dynamically generated design. The
\verb|tree_mux_array| is made up of two dynamically generated cells: \verb|muxa|
and \verb|mux_abar|. The only diffference between these cells is that input
select signal is either hooked up to the \textbf{SEL} or
\textbf{SEL\_bar} signals (see highlighted boxes in
Figure~\ref{fig:colmux}). These cells are initialized the the
\verb|column_muxa| and \verb|column_muxabar| classes in \verb|columm_mux.py|. Instances
of \verb|ptx| PMOS transistors are added to the design and the necessary
routing is performed using the \verb|add_rect()| function. A horizontal rail
is added in metal2 for both the SEL and Sel\_bar signals. Underneath
those input rails, horizontal straps are added. These straps are used
to connect the BL and BL\_B outputs from \verb|muxa| to the BL and BL\_B
outputs of \verb|mux_abar|. Vertical conenctors in metal3 are added at the
bottom of the cell so that connections can be made down to the sense
amp. Vertical connectors are also added in metal1 so that the cells
can connect down to other mux cells when the depth of the tree mux is
more than one level.
The \verb|tree_mux_array| class is used to generate the tree mux.
Instances of both the \verb|muxa| and \verb|mux_abar| cells are instantiated and
are tiled row by row. The offset of the cell in a row is determined
by the depth of that row in the tree mux. The pattern used to
determine the offset of the mux cells is
$muxa.width*(i)*(2*row\_depth)$ where is the column number. As the
depth increases, the mux cells become further apart. A separate
``for'' loop is invoked if the $depth>1$, which extends the
power/ground and select rails across the entire width of the array.
Similarly, if the $depth>1$, spice net names are created for the
intermediate connection made at the various levels. This is necessary
to ensure that a correct spice netlist is generated and that the
input/output pins of the column mux match the pins in the modules that
it is connected to.
\subsubsection{Single\_Level Column Mux}
\label{sec:single_level_column_mux}
The optimal design for column mux uses a single NMOS device, driven by the input address or decoded input addresses.
Figure~\ref{fig:2t1_single_level_column_mux} shows the schematic of a 2:1 single-level column mux. In this column mux one bit
of address and its complementry drive the pass transistors. Selected transistors will
connect their corresponding bitlines ( 1 set of column out of 2 set of columns) to sense-amp and write-driver circuitry for read or write operation.
Figure~\ref{fig:4t1_single_level_column_mux} shows the schematic of a 4:1 single-level column mux. In this column mux, 2 input
address are decoded using a 2:4 decoder ( 2:4 decoder is explain in section~\ref{sec:hierdecoder}). 2:4 decoder provides a one-hot set of outputs, so only one set of columns
will be selected and connected to sense-amp and write-driver
( in figure~\ref{fig:4t1_single_level_column_mux} one set of column out of four sets of column is selected).
In OpenRAM, the \verb|single-level_mux_array| is a dynamically generated design and
it is made up of dynamically generated cell (\verb|single-level_mux|).
\verb|single-level_mux| uses the parameterized transistor class \verb|ptx| to generate two NMOS transistors
which will connect the BL and BLB of selected columns to sense-amp and write-driver. Horizontal rails are added for $sel$ signals. Vertical
straps connect the BL and BLB of bitcell\_array to BL and BLB of single-level column mux and also BL-out and BLB-out of single-level
column mux to BL and BLB of sense-amp and write-driver.