diff --git a/docs/figs/Array.svg b/docs/figs/Array.svg deleted file mode 100644 index 419083d3..00000000 --- a/docs/figs/Array.svg +++ /dev/null @@ -1,1475 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - Bl Br - - - - - Bl Br - - - - - Bl Br - - - - - Bl Br - - - - - - - - - - - Bl Br - Bl Br - Bl Br - Bl Br - Col.Mux - Cell - Cell - Cell - Cell - Cell - Cell - Cell - Cell - - - - - - - - - - Array - - - - - Bl Br - - - - - Bl Br - - - - - Bl Br - - - - - Bl Br - - Cell - Cell - Cell - Cell - Cell - Cell - Cell - Cell - - - - - - - - - - - - - - - - - - Bl Br - Bl Br - Bl Br - Bl Br - Precharge - - diff --git a/docs/figs/column_mux_schem.pdf b/docs/figs/column_tree_mux.pdf similarity index 100% rename from docs/figs/column_mux_schem.pdf rename to docs/figs/column_tree_mux.pdf diff --git a/docs/figs/column_mux_schem.svg b/docs/figs/column_tree_mux.svg similarity index 100% rename from docs/figs/column_mux_schem.svg rename to docs/figs/column_tree_mux.svg diff --git a/docs/figs/decoder_to _array.svg b/docs/figs/decoder_to _array.svg deleted file mode 100644 index 9b7499f4..00000000 --- a/docs/figs/decoder_to _array.svg +++ /dev/null @@ -1,409 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - Address Decoder - - - - Vdd - Word Line - Vss - - - - Vdd - Word Line - Vss - - - - - Array - - - } - } - N-Well - P-Well - - diff --git a/docs/figs/layout_view_1024_16.png b/docs/figs/layout_view_1024_16.png deleted file mode 100644 index a97bfe63..00000000 Binary files a/docs/figs/layout_view_1024_16.png and /dev/null differ diff --git a/docs/figs/layout_view_64_4.png b/docs/figs/layout_view_64_4.png deleted file mode 100644 index 44e7b060..00000000 Binary files a/docs/figs/layout_view_64_4.png and /dev/null differ diff --git a/docs/figs/nand2.pdf b/docs/figs/nand2.pdf deleted file mode 100644 index e6bda803..00000000 Binary files a/docs/figs/nand2.pdf and /dev/null differ diff --git a/docs/figs/nand3.pdf b/docs/figs/nand3.pdf deleted file mode 100644 index b5f91a52..00000000 Binary files a/docs/figs/nand3.pdf and /dev/null differ diff --git a/docs/modules.tex b/docs/modules.tex index 7f4d0c07..19ed7e82 100644 --- a/docs/modules.tex +++ b/docs/modules.tex @@ -127,79 +127,58 @@ the top of a bitcell array. \subsection{Address Decoders} -\label{sec:addressdecoder} +\label{sec:address_decoder} -The address decoder takes the row address bits from the address bus as -inputs, and asserts the appropriate wordline in the row that data is -to be read or written. A n-bit address input controls $2^n$ word -lines. +The address decoder deodes the binary-encoded row address bits from the +address bus as inputs, and asserts a one-hot wordline in the row that +data is to be read or written. OpenRAM provides a hierarchical address +decoder as the default, but will soon have other options. -OpenRAM provides a hierarchical address decoder as the default, but -will soon have other options. +The address decoders are created using parameterized gates (pnand2, +pnand3, pinv) and transistors (ptx). This means that the decoders do +not rely on any hard library cells. \subsubsection{Hierarchical Decoder} \label{sec:hierdecoder} -Hierarchical decoder is a type of decoder which the constrcution takes place hierarchically. -The simple 2:4 decoder is shown in the Figure~\ref{fig:2 to 4 decoder}. The operation of -this decoder can be explained as follows: soon after the address signals A0 and A1 are put on the address lines, -depending on the signal combination, one of the wordlines will rise after a brief amount of time. For example if the -address input is A0A1=00 then the output is W0W1W2W3=1000. The 2:4 address decoder uses inverters and two -input nand gates for its constrcution while the gates are sized to have equal rise and fall time. -As the decoder size increases the size of the nand gates required for decoding also increases. -Table~\ref{table:2-4 hierarchical_decoder} gives the detailed input and output siganls -for the 2:4 hierarchical decoder. + +A simple 2:4 decoder is shown in Figure~\ref{fig:2:4decoder}. This +decoder computes all of the possible decode values using a single +level of nand gates along with the inverted and non-inverted inputs. +As the decoder size increases the size of the nand gates required for +decoding would increase proportional to the bits to be decoded. This +would not be practical for large decoders. \begin{figure}[h!] \centering \includegraphics[scale=.6]{./figs/2t4decoder.pdf} \caption{Schematic of 2-4 simple decoder.} -\label{fig:2 to 4 decoder} +\label{fig:2:4decoder} \end{figure} - \begin{table}[h!] - \begin{center} - \begin{tabular}{| c | c |} - \hline - A[1:0] & Selected WL\\ \hline - 00 & 0\\ \hline - 01 & 1\\ \hline - 10 & 2\\ \hline - 11 & 3\\ \hline - - \end{tabular} - \end{center} - \caption{Truth table for 2:4 hierarchical decoder.} - \label{table:2-4 hierarchical_decoder} - \end{table} - - -An $n$-bit decoder requires {$2^n$} logic gates, each with $n$ inputs. For example, with $n$ = 6, -64 $NAND6$ gates are needed to drive 64 inverters to implement the decoder. -It is clear that gates with more than 3 inputs create large series resistances and long delays. -Rather than using $n$-input gates, it is preferable to use a cascade of gates. -Typically two stages are used: a predecode stage and a final decode stage. -The predecode stage generates intermediate signals that are used -by multiple gates in the final decode stage. - - +A hierarchical decoder uses two-levels of decoding hierarchy to +perform an address decode. The first stage computes predecoded values +while the second stage computes the final decoded values. +Figure~\ref{fig:4 to 16 decoder} shows a 4:16 heirarchical +decoder. The decoder uses two 2:4 decoders for +predecoding and 2-input nand gates and inverters for final decoding to +form the 4:16 decoder. \begin{figure}[h!] \centering \includegraphics[scale=.6]{./figs/4t16decoder.pdf} -\caption{Schematic of 4 to 16 hierarchical decoder.} +\caption{Schematic of 4:16 hierarchical decoder.} \label{fig:4 to 16 decoder} \end{figure} -Figure~\ref{fig:4 to 16 decoder} shows the 4 to 16 heirarchical decoder. The structure of the decoder consists of two 2:4 decoders for predecoding and 2-input nand gates and inverters for final decoding to form the 4:16 decoder. -In the predecoder, a total of 8 intermediate signals are generated from the address bits and their complements. -The concept of using predecoing and final decoding stage for construction of address decoder is very procutive since small -decoders like 2:4 decoder is used for predecoding. The operation of 4:16 heirarchical decoder can explained with an example. If the address is A0A1A2A3=0000 the output of the predecoder1 and predeocder2 will be -WL0WL1WL2WL3=1000 and WL0WL1WL2WL3=1000, respectively. According to the connections in figure~\ref{fig:4 to 16 decoder} the wordline 0 of predecoder1 and predecoder2 are conneted -to the first 2-input nand gate in the decode stage representing the wordline 0 of the final decoding stage. Hence depengin on the combination -of the input signal one of the wordline will rise. In this case since the address input is A0A1A2A3=0000 the wordline 0 should go high. Table~\ref{table:4-16 hierarchical_decoder} gives the detailed input and output siganls -for the 4:16 hierarchical decoder. +The predecoder generates a total of 8 intermediate signals from the +address bits and their complements. These intermediate signals are in +two groups of 4 from each decoder. The enumeration of all 4 x 4 +predecoded values are used by the final decode to produce the 16 +decoded results. As an example, Table~\ref{table:4-16 hierarchical_decoder} +gives the detailed input and output siganls for the 4:16 hierarchical +decoder. \begin{table}[h!] @@ -230,43 +209,72 @@ for the 4:16 hierarchical decoder. \end{table} -As the size of the address line increases higher level decoder can be created using the lower level decoders. For example for a 8:256 decoder, two instances of 4:16 followed by 256 2-input nand gates and inverters -can form the decoder. In order to construct the 8:256 decoder, first 4:16 decoder should be constructed through using 2:4 deccoders. Hence the name is hierarchical decoder. +As the address size increases, additional sizes of pre- and final +decoders can be used. In OpenRAM, there are implementations for +\verb|modules/hierarchical\_predecode2x4.py| and +\verb|modules/hierarchical\_predecode3x8.py| to produce 2:4 and 3:8 +predecodes, respectively. These same decoders are used to generate the +column mux select bits as well. + +For the final decode, we can use either pnand2 or pnand3 gates. This +allows a maximum size of three 3:8 predocers along with a final pnand3 decode +stage, or, 512 word lines. To extend beyond this, a pnand4 or +a 4:16 predecoder would be needed. \subsection{Wordline Driver} \label{sec:wldriver} -Word line drivers are inserted, in between the word line -output of the address decoder and the word line input of the bitcell-array. The word -line drivers ensure that as the size of the memory array increases, -and the word line length and capacitance increases, the word line -signal is able to turn on the access transistors in the 6T cell. Also, as the bank select signal -in multi-bank structures is $ANDED$ with the word line output of decoder, -bitcells turn on only when bank is selected. -Figure~\ref{fig:wordline_driver} shows the diagram of word line driver and its input/output pins. -In OpenRAM, word line drivers are created by using the \verb|pinv| and \verb|nand2| classes which -takes the transistor size and cell height as inputs (so that it can abutt the -6T cell). Word line driver is added as seperate module in \verb|compiler|. +The word line driver buffers the address decoder to drive the wordline and +gates the signal until the decode has stabilized. Without waiting, an +incorrectly asserted wordline could erase memory contents. +The word line driver is sized according to the bitcell array width so +that wordlines in larger memory arrays can be appropriately driven. + +% gating for first half decode, second half read/write +The first half of the clock cycle is used for address decoding in +OpenRAM. Therefore, the wordline driver is enabled in the second half +of the clock cycle in OpenRAM. The buffered clock signal drives each +wordline driver row and is logically ANDed with the decoder output. + +% bank clock gating for wordline driver +In multi-bank structures the clock buffer is also anded with the bank +select signal to prevent the read/writing of an entire bank. + \begin{figure}[h!] \centering -\includegraphics[scale=.8]{./figs/wordline_driver.pdf} +\includegraphics[scale=.6]{./figs/wordline_driver.pdf} \caption{Diagram of word line driver.} \label{fig:wordline_driver} \end{figure} +Figure~\ref{fig:wordline_driver} illustrates the wordline driver and +its inputs/outputs. This is implemented in the +\verb|modules/wordline_driver.py| module and matches the number of +rows in the bitcell array of a bank. + +OpenRAM creates the wordline drivers using the parameterized pinv and +pnand2 classes. This enables the wordline driver to be matched to the +bitcell height and to sized to drive the wordline load. + + \subsection{Column Mux} \label{sec:column_mux} -The column mux takes the column address bits from the address bus -selects the appropriate bitlines for the word that is to be read from -or written to. It takes n-bits from the address bus and can select -$2^n$ bitlines. The column mux is used for both the read and write -operations; it connects the bitline of the memory array to both the -sense ampflifier and the write driver. +The column mux is an optional module in an SRAM bank. Without a column +mux, the bank is assumed to have a single word in each row. A column +mux enables more more than one word to be stored in each row and +read/written individually. The column mux is used for both the read +and write operations by connecting the bitlines of a bank to +both the sense amplifier and the write driver. -OpenRAM provides several options for column mux, but the default -is a single-level column mux which is sized for optimal speed. +In OpenRAM, the column mux uses the {\bf high address bits} to select +the appropriate word in each row. If n-bits are used, there are $2^n$ +words in each row. OpenRAM currently allows 2, 4, or 8 words per row, +but the 8 words are not fully debugged (as of 2/12/18). + +%% OpenRAM provides several options for column mux, but the default +%% is a single-level column mux which is sized for optimal speed. %% \subsubsection{Tree\_Decoding Column Mux} %% \label{sec:tree_decoding_column_mux} @@ -352,30 +360,37 @@ is a single-level column mux which is sized for optimal speed. %% it is connected to. -\subsubsection{Single\_Level Column Mux} +\subsubsection{Single-Level Column Mux} \label{sec:single_level_column_mux} -The optimal design for column mux uses a single NMOS device, driven by the input address or decoded input addresses. -Figure~\ref{fig:2t1_single_level_column_mux} shows the schematic of a 2:1 single-level column mux. In this column mux one bit -of address and its complementry drive the pass transistors. Selected transistors will -connect their corresponding bitlines ( 1 set of column out of 2 set of columns) to sense-amp and write-driver circuitry for read or write operation. -Figure~\ref{fig:4t1_single_level_column_mux} shows the schematic of a 4:1 single-level column mux. In this column mux, 2 input -address are decoded using a 2:4 decoder ( 2:4 decoder is explain in section~\ref{sec:hierdecoder}). 2:4 decoder provides a one-hot set of outputs, so only one set of columns -will be selected and connected to sense-amp and write-driver -( in figure~\ref{fig:4t1_single_level_column_mux} one set of column out of four sets of column is selected). +OpenRAM includes a single-level pass-gate mux implemtation for the +column mux. A single level of NMOS devices is driven by either the +input address (and it's complement) or decoded input addresses using a +2:4 predecoder (Section~\ref{sec:hierdecoder}). -In OpenRAM, the \verb|single-level_mux_array| is a dynamically generated design and -it is made up of dynamically generated cell (\verb|single-level_mux|). -\verb|single-level_mux| uses the parameterized transistor class \verb|ptx| to generate two NMOS transistors -which will connect the BL and BLB of selected columns to sense-amp and write-driver. Horizontal rails are added for $sel$ signals. Vertical -straps connect the BL and BLB of bitcell\_array to BL and BLB of single-level column mux and also BL-out and BLB-out of single-level -column mux to BL and BLB of sense-amp and write-driver. +Figure~\ref{fig:2t1_single_level_column_mux} shows the schematic of a +2:1 single-level column mux. In this column mux, the {\bf MSB of the + address bus} and it's complement drive the pass transistors. + +Figure~\ref{fig:4t1_single_level_column_mux} shows the schematic of a +4:1 single-level column mux. The select bits are decoded from the {\bf + 2 MSB of the address bus} using a 2:4 decoder. The 2:4 decoder +provides one-hot select signals to select one column. + +In OpenRAM, one mux, single\_level\_mux, is dynamically generated in +\verb|modules/single_level_column_mux.py| and multiple of these muxes +are tiled together in \verb|modules/single_level_column_mux_array.py|. + +single\_level\_mux uses the parameterized ptx (Section~\ref{sec:ptx} +to generate 2 or 4 NMOS transistors for each the bl and br +bitlines. Horizontal rails are added for the $sel$ signals. The +bitlines are automatically pitch-matched to the bitcell array. \begin{figure}[h!] \centering -\includegraphics[scale=.7]{./figs/2t1_single_level_column_mux.pdf} -\caption{Schematic of a 2:1 single level column mux.} +\includegraphics[scale=.5]{./figs/2t1_single_level_column_mux.pdf} +\caption{Schematic of a 2:1 single level column mux. \fixme{Signals names are wrong.}} \label{fig:2t1_single_level_column_mux} \end{figure} @@ -383,8 +398,8 @@ column mux to BL and BLB of sense-amp and write-driver. \begin{figure}[h!] \centering -\includegraphics[scale=.6]{./figs/4t1_single_level_column_mux.pdf} -\caption{Schematic of a 4:1 single level column mux.} +\includegraphics[scale=.5]{./figs/4t1_single_level_column_mux.pdf} +\caption{Schematic of a 4:1 single level column mux. \fixme{Signals names are wrong.}} \label{fig:4t1_single_level_column_mux} \end{figure} diff --git a/docs/parameterized.tex b/docs/parameterized.tex index bcd89871..e85670bb 100644 --- a/docs/parameterized.tex +++ b/docs/parameterized.tex @@ -126,7 +126,7 @@ height=tech.cell_6t["height"]) \begin{figure}[h!] \centering -\includegraphics[width=10cm]{./figs/nand2.pdf} +%\includegraphics[width=10cm]{./figs/nand2.pdf} \caption{An example of Parameterized NAND2(nand\_2)} \label{fig:nand2} \end{figure} @@ -169,7 +169,7 @@ height=tech.cell_6t["height"]) \begin{figure}[h!] \centering -\includegraphics[width=10cm]{./figs/nand3.pdf} +%\includegraphics[width=10cm]{./figs/nand3.pdf} \caption{An example of Parameterized NAND3(nand\_3)} \label{fig:nand3} \end{figure}