← Back to Course
# Processor Branches and RAM ## CS 315 Computer Architecture --- ## Where We Are The processor can already execute R-type and I-type instructions plus `jal`/`jalr`. **Two new capabilities needed today:** - **Conditional branches** — `beq`, `bne`, `blt`, `bge` - **Data memory (RAM)** — `lb/sb`, `lw/sw`, `ld/sd` | Category | Instructions | |----------|-------------| | Data processing | `addi`, `add`, `sub`, `mul`, shifts | | Control | `jal`, `jalr`, **beq**, **bne**, **blt** | | Memory | **lb/sb**, **lw/sw**, **ld/sd** | --- ## High-Level Datapath
flowchart LR A["Instruction Word"] --> B["InstDecoder"] B --> C["Data Processing\n(ALU)"] B --> D["Control\n(PC update)"] B --> E["Memory\n(RAM)"] D --> F["Branch Unit + PCsel"] E --> G["Load/Store logic"]
The InstDecoder sets control lines; new behavior lives in dedicated components. --- ## Conditional Branch: Two Steps ```asm beq t0, t1, label # if (t0 == t1) goto label ``` **Step 1 — Compute Branch Target Address (BTA):** ```text BTA = PC + imm-b ``` `imm-b` is the sign-extended B-type immediate from `ImmDecoder`. Negative offsets enable backward (loop) branches. **Step 2 — Decide whether to take it:** Compare `rs1` vs `rs2`. If true: `PC = BTA`, else `PC = PC + 4`. --- ## Branch vs. Jump vs. Sequential | Instruction class | PC update | |-------------------|-----------| | Sequential (`add`, `addi`, …) | always `PC + 4` | | `jal` / `jalr` (jumps) | always target address | | `beq` / `bne` / `blt` / `bge` | target **if** comparison true, else `PC + 4` |
Jumps
always
redirect the PC. Branches redirect
only if
the comparison succeeds.
--- ## The Four Branch Comparisons | Instruction | Comparison | `BUOp` | |-------------|------------|--------| | `beq` | `rs1 == rs2` | `00` | | `bne` | `rs1 != rs2` | `01` | | `blt` | `rs1 < rs2` | `10` | | `bge` | `rs1 >= rs2` | `11` |
bgt rs1, rs2, label
is a pseudo-instruction assembled as
blt rs2, rs1, label
— only four comparisons needed in hardware.
--- ## Branch Flow
flowchart TD A["Fetch branch instruction"] --> B["Compute BTA = PC + imm-b (ALU)"] A --> C["Read rs1, rs2 from RegFile"] C --> D{"Branch Unit:\ncomparison true?"} D -- yes --> E["PC = BTA"] D -- no --> F["PC = PC + 4"]
--- ## The Branch Unit (BU) The ALU is busy computing BTA, so comparisons go in a dedicated **Branch Unit**. **Inputs:** `A` (rs1, 64b), `B` (rs2, 64b), `BUOp` (2b) **Output:** `take_branch` / `PCbr` (1b)
flowchart LR A["A = rs1 (64b)"] --> EQ["="] B["B = rs2 (64b)"] --> EQ A --> NE["!="] B --> NE A --> LT["<"] B --> LT A --> GE[">="] B --> GE EQ --> M["MUX\n(BUOp)"] NE --> M LT --> M GE --> M M --> O["take_branch (PCbr)"]
--- ## Branch Unit Design Notes - **Do not decode `funct3` directly** inside the BU — RISC-V branch `funct3` codes are non-contiguous (`beq=000`, `bne=001`, `blt=100`, `bge=101`). Use a clean 2-bit `BUOp` from the InstDecoder instead. - **Add a "BU off" state** — non-branch instructions must not accidentally signal a taken branch. Set `PCsel = 0` for all non-branch instructions. - The BU is **purely combinational** — no clock needed. --- ## PC Selection: PCsel + PCbr **Policy:** - `PCsel = 0` → PC always advances to `PC+4` (sequential) - `PCsel = 1` → PC redirected (branch or jump); outcome depends on `PCbr` **Option 1 (preferred): inner MUX driven by `PCbr`**
flowchart TD PCbr["PCbr (Branch Unit)"] --> IMUX["inner MUX"] PC4a["PC+4"] --> IMUX BTA["BTA (from ALU)"] --> IMUX IMUX --> PMUX["PCsel MUX"] PC4b["PC+4"] --> PMUX JTA["JTA"] --> PMUX PCsel["PCsel (InstDecoder)"] --> PMUX PMUX --> PC["PC register"]
--- ## New Control Lines | Signal | Width | Meaning | |--------|-------|---------| | `PCsel` | 1 | Redirect PC? (`0`=seq, `1`=branch/jump) | | `BUOp` | 2 | Branch Unit comparison selector | | `PCbr` | 1 | BU output: 1 if comparison succeeded |
The InstDecoder spreadsheet must add
PCsel
and
BUOp
columns. Set
PCsel = 0
for every existing non-branch instruction.
--- ## Worked Example: Countdown Loop ```asm main: li t0, 3 # counter = 3 li t1, 0 # accumulator = 0 loop: beq t0, zero, done # exit when counter == 0 add t1, t1, t0 # acc += counter addi t0, t0, -1 # counter-- jal loop # j loop done: add a0, t1, zero # return 3+2+1 = 6 unimp ``` Computes `3 + 2 + 1 = 6` into `a0`. --- ## Loop Trace: Control Signals | Instruction | `PCsel` | `BUOp` | `PCbr` | Next PC | |-------------|---------|--------|--------|---------| | `beq` (t0=3) | 1 | `00` | 0 | `PC+4` | | `add t1,t1,t0` | 0 | x | x | `PC+4` | | `addi t0,t0,-1` | 0 | x | x | `PC+4` | | `jal loop` | 1 | x | n/a | JTA | | `beq` (t0=0) | 1 | `00` | 1 | BTA (done) | The backward `jal` works because the J-type immediate is sign-extended (negative offset). --- ## Data Memory: Adding RAM Programs need a **stack** for arrays, saved registers, and the calling convention. We use Digital's **RAM (Separated Ports)** component: - **Data bits:** 64 — each cell holds one doubleword; `ld`/`sd` become trivial - **Address bits:** determined by desired total size | Sub-circuit | Purpose | |-------------|---------| | PC + Inst Mem | Fetch | | Reg File + ALU | Execute | | **Data RAM** | Load/Store | --- ## Sizing the RAM **Goal: 1024-byte data memory from 64-bit cells** ```text Bytes per cell = 64 bits / 8 = 8 bytes = 2^3 Number of cells = 1024 / 8 = 128 = 2^7 Address bits needed = 7 Check: 2^3 * 2^7 = 2^10 = 1024 bytes ✓ ``` | Quantity | Value | |----------|-------| | Bits per cell | 64 (`2^6`) | | Bytes per cell | 8 (`2^3`) | | Number of cells | 128 (`2^7`) | | Address bits | 7 | | Total size | 1024 bytes (`2^10`) | --- ## Byte Address vs. Doubleword Address The ALU produces a **byte address**; the RAM expects a **doubleword (DW) address**. ```text DW_addr = byte_addr >> 3 (drop the low 3 bits) ``` ```text byte_addr bits: ... 9 8 7 6 5 4 3 | 2 1 0 \-----------/ \---/ DW address byte offset (to RAM ADDR) (discarded for ld/sd) ``` Use a **splitter** in Digital to drop bits 0-2 and feed the high bits to RAM ADDR. --- ## RAM Connection Diagram
flowchart LR ALU["ALU result\n(byte addr, 64b)"] --> SP["splitter:\ndrop low 3 bits"] SP --> RAM["RAM ADDR\n(DW address)"] RAM -->|"D out 64b"| LD["load logic\n→ RegFile"] SI["store logic\n(from RegFile)"] -->|"D in 64b"| RAM
--- ## New Memory Control Lines | Signal | Meaning | |--------|---------| | `LD` | RAM read enable (load) | | `ST` | RAM write enable (store) | | `MSZ` (2b) | Memory size: byte / word / doubleword | | `M2R` / `WDsel` | Select RAM output for register write-back | Keep the RAM component at the **top level** of the processor so you can inspect it during simulation. --- ## `ld`/`sd`: The Easy Case Each RAM cell is 64 bits = one doubleword. - **`ld`**: read the cell at DW address, write all 64 bits to register - **`sd`**: write all 64 bits of register to the cell at DW address No sub-word logic needed.
This is why we chose 64-bit cells —
ld
/
sd
map directly to single cell reads and writes.
--- ## `lw`/`sw`: Word Access A word is 32 bits; the cell is 64 bits. Need helper logic. **Load word:** 1. Convert byte address to DW address (drop bits 2-0) 2. Read 64-bit cell; split into lower [31:0] and upper [63:32] 3. **Bit 2** of byte address selects which half (word index) 4. Sign-extend 32 → 64 bits ```text bit 2 = 0 → lower word (bits 0..31) bit 2 = 1 → upper word (bits 32..63) ``` **Store word:** Assert both `LD` and `ST`. Read-modify-write: preserve the untouched 32-bit half, replace the selected half with new data. --- ## `lb`/`sb`: Byte Access Same pattern as word access, but using **bits 2-0** to pick 1 of 8 bytes. **MSZ encoding (follows RISC-V `funct3` low bits):** | Operation | `MSZ` | Width | |-----------|-------|-------| | `lb` / `sb` | `00` | 8 bits, sign-extended to 64 | | `lw` / `sw` | `10` | 32 bits, sign-extended to 64 | | `ld` / `sd` | `11` | 64 bits | --- ## Load Path: MSZ MUX
flowchart TD RAMOUT["RAM D out (64b)"] --> LDD["ld: full 64b"] RAMOUT --> SPW["split words, MUX by bit 2"] SPW --> SXW["sign-extend 32→64"] RAMOUT --> SPB["split bytes, MUX by bits 2-0"] SPB --> SXB["sign-extend 8→64"] LDD --> MSZMUX["MSZ MUX"] SXW --> MSZMUX SXB --> MSZMUX MSZMUX --> WB["M2R MUX → RegFile"]
--- ## Store Word: Read-Modify-Write ```text D64cur = current cell from RAM (both LD and ST asserted) Wnew = lower 32 bits of RD1 (new value to store) Candidates: bit 2 = 0: Wnew : W1 (replace lower half) bit 2 = 1: W0 : Wnew (replace upper half) MSZ MUX selects the right candidate → RAM Din ```
Always add load logic
after
the RAM and store logic
before
the RAM in the circuit layout.
--- ## Program Initialization The processor starts blank — all registers are 0, memory uninitialized. Every test program needs explicit setup: ```asm main: li sp, 1024 # initialize stack pointer li a0, 5 # parameter 1 li a1, 10 # parameter 2 jal myfunc # use jal, NOT call unimp # end marker → processor halts ``` - Remove `.global` directives - Use `jal` instead of `call` - End with `unimp` --- ## ROM / Decoder Programming The InstDecoder ROM is keyed by `INUM`.
Direct copy-paste of large hex values is unreliable. Generate a
.hex
file with the required prefix and load it directly into the ROM — same approach as instruction memory via
makerom3.py
.
**Lab 10 / Project 6 file naming:** | File | Decoder | |------|---------| | `inst-decode-part1.dig` | Lab 10 Part 1 (`addi`, `add`, `unimp`) | | `inst-decode-part2.dig` | Lab 10 Part 2 (+ `jal`, `jalr`) | --- ## Key Concepts Reference | Concept | Definition | |---------|------------| | **BTA** | `PC + imm-b` — where a taken branch goes | | **imm-b** | Sign-extended B-type immediate (PC-relative) | | **Branch Unit** | Compares rs1, rs2 → outputs `PCbr` | | **BUOp** | `00`=`==`, `01`=`!=`, `10`=`<`, `11`=`>=` | | **PCsel** | `1` = redirect PC (branch/jump), `0` = sequential | | **PCbr** | 1 when branch comparison succeeds | | **MSZ** | `00`=byte, `10`=word, `11`=doubleword | | **M2R/WDsel** | Selects RAM output for register write-back | | **DW addr** | `byte_addr >> 3` — index into 8-byte cells | --- ## Summary 1. **Two new capabilities**: conditional branches and data memory (RAM). 2. **Branches are two-step**: compute `BTA = PC + imm-b`, then compare `rs1`/`rs2` via the Branch Unit to set `PCbr`. 3. **Branch Unit**: four comparators selected by 2-bit `BUOp`; outputs `take_branch`. Use a clean `BUOp` instead of raw `funct3`. 4. **PCsel + PCbr**: `PCsel = 0` always gives `PC+4`; `PCsel = 1` uses an inner MUX (driven by `PCbr`) to choose between `PC+4` and `BTA`. 5. **RAM sizing**: 1024-byte RAM = 128 cells of 64 bits = 7 address bits. Convert byte address to DW address by dropping low 3 bits. 6. **Sub-word access**: use bit 2 for word selection, bits 2-0 for byte selection; stores need read-modify-write; `MSZ` MUX selects byte/word/DW output. 7. **Program init**: set `sp`, use `jal` not `call`, end with `unimp`.