← Back to Course
# Lab: Project 07 and Final Review ## CS 315 Computer Architecture --- ## Session Goals - Finish Project 07 Hazard Unit - Review pipeline timing diagrams (cycle counting) - Classify hazards: forward / stall / flush - Extend the datapath for new instructions (LWU, SWPR)
flowchart LR A[Lab 11 problems] --> B[Pipeline timing diagrams] B --> C[Hazard reasoning] C --> D[Project 07 Hazard Unit] A --> E[Add-an-instruction] E --> F[Final exam prep] C --> F
--- ## Final Exam Format - **Color circuit diagrams** — read and annotate datapaths - **Cycle counting and pipelining** — count cycles, identify each forward/stall/flush - **Add an instruction** — describe datapath, control, component, decoder changes - **Sum-of-products** — truth tables to boolean equations to gate-level circuits
Project is weighted ~25% pre-midterm, ~75% post-midterm. Autograder runs as-is.
--- ## The Five-Stage Pipeline | Stage | Abbrev | What happens | |-------|--------|--------------| | Instruction Fetch | **I** (IF) | Read instruction, compute PC+4 | | Decode / Reg Read | **D** (DR) | Decode; read RR0/RR1 into RD0/RD1; build immediate | | Execute | **E** (EX) | ALU computes result; branch target computed | | Memory | **M** (MEM) | Load or store to data RAM | | Write Back | **W** (WB) | Write result into RegFile |
Stage suffixes in Project 07:
_2
= DR/EX boundary,
_3
= EX/MEM,
_4
= MEM/WB
--- ## Pipeline Registers
flowchart LR PC[PC] --> IFDR[IF/DR reg] IFDR --> DREX[DR/EX reg] DREX --> EXMEM[EX/MEM reg] EXMEM --> MEMWB[MEM/WB reg] MEMWB --> RF[(RegFile)] RF -. read RD0/RD1 .-> DREX
- Each register carries one instruction's signals forward one stage per clock - `RR0_2` = read-reg-0 of instruction entering EX - `WR_3` = destination reg of instruction in MEM - `RFW_4` = register-file-write flag of instruction in WB --- ## Timing Diagram: Forwarding Only Program (full pipeline with hazard unit): ```text addi a1, zero, 3 addi a2, zero, 4 add a0, a1, a2 # a0 = 7 ``` ```text c1 c2 c3 c4 c5 c6 c7 addi a1 | I | D | E | M | W | | | addi a2 | | I | D | E | M | W | | add a0 | | | I | D | E | M | W | ``` **Last WB: cycle 7.** No stalls needed — forwarding resolves both dependencies. --- ## Why Forwarding Works Here In cycle 5, when `add` is in EX: - `addi a2` is in MEM → its result `ALUR_3` can be forwarded - `addi a1` is in WB → its result `MR_4` can be forwarded
flowchart LR A1["addi a1 (in WB)"] -->|MR_4 forwarded| EX["add in EX"] A2["addi a2 (in MEM)"] -->|ALUR_3 forwarded| EX
Both values exist in the pipeline — **no stall, no flush needed**. --- ## Forwarding Hazard Unit Logic ```text // FRD0 selector (FRD1 is symmetric with RR1_2) if ((RR0_2 == WR_3) && RFW_3) { FRD0 = 2; // forward ALUR_3 (EX/MEM, closest) } else if ((RR0_2 == WR_4) && RFW_4) { FRD0 = 1; // forward MR_4 (MEM/WB) } else { FRD0 = 0; // use RegFile value } ```
Closest wins.
Always test
_3
before
_4
. Only forward if
RFW
is set.
--- ## Forwarding Priority: "Closest Wins" ```text addi a0, zero, 3 addi a0, zero, 4 add a0, a0, a0 # must see 4, not 3 ``` When `add` is in EX: `ALUR_3 = 4` (from second addi, in MEM), `MR_4 = 3` (from first addi, in WB)
flowchart TD A["add a0, a0, a0 in EX wants a0"] --> B{"RR0_2 == WR_3 and RFW_3?"} B -- yes --> C["FRD0 = 2, forward ALUR_3 = 4"] B -- no --> D{"RR0_2 == WR_4 and RFW_4?"} D -- yes --> E["FRD0 = 1, forward MR_4 = 3"] D -- no --> F["FRD0 = 0, use RegFile"]
Getting priority backwards is the **most common forwarding bug**. --- ## EX Stage MUX ```text +-----------+ RD0 ---->|0 | ALUR_3 ->|1 MUX |---> to ALU A (replaces RD0 everywhere) MR_4 --->|2 | +-----------+ ^ FRD0 ```
The MUX output must replace
RD0
/
RD1
everywhere
in EX — not just at the ALU input. Forgetting this is a common wiring mistake.
--- ## The Load-Use Stall Forwarding cannot help when a load's value is needed by the very next instruction — the value isn't ready until the end of MEM. ```text addi a1, zero, 3 sd a1, 0(zero) ld a2, 0(zero) add a0, a2, a2 # a0 = 8 ``` ```text c1 c2 c3 c4 c5 c6 c7 c8 c9 addi a1 | I | D | E | M | W | | | | | sd a1 | | I | D | E | M | W | | | | ld a2 | | | I | D | E | M | W | | | add a0 | | | | I | D |(B) | E | M | W | ``` **Last WB: cycle 9.** One stall (bubble B), then forward. --- ## Load-Use Stall Logic Detect: EX has a load writing a register that DR needs. ```text if (RFW_3 && MLD_3 && ((RR0_2 == WR_3) || (RR1_2 == WR_3))) { PC_EN = 0; // freeze PC IF_DR_EN = 0; // freeze IF/DR DR_EX_EN = 0; // freeze DR/EX EX_MEM_CLR = 1; // inject bubble } else { PC_EN = EN_ORG; IF_DR_EN = 1; DR_EX_EN = 1; EX_MEM_CLR = CLR_ORG; } ```
Preserve
EN_ORG
/
CLR_ORG
in the else branch — manual single-stepping requires it.
--- ## Back-to-Back Loads: One Stall Suffices ```text ld a2, 0(zero) ld a3, 8(zero) add a0, a2, a3 ``` ```text c1 c2 c3 c4 c5 c6 c7 c8 ld a2 | I | D | E | M | W | | | | ld a3 | | I | D | E | M | W | | | add a0 | | | I | D |(--)| E | M | W | ``` After **one** bubble, `add` is in EX at c6: - `a3` forwarded from `ld a3` MEM/WB result - `a2` also available to forward (ready a cycle earlier) **Cycles = 3 + 4 + 1 = 8. Only one stall.** --- ## Control Hazards: Flushing When `jal`/`beq`/etc. change the PC, already-fetched instructions must be discarded. ```text main: li a0, 3 jal foo # jump here unimp # must NOT execute foo: addi a0, a0, 4 # a0 = 7 ``` **Resolve the jump in EX**, then flush the two instructions fetched behind it: ```text if (PCbr_2 == 1) { IF_DR_CLR = 1; // flush instruction in IF/DR DR_EX_CLR = 1; // flush instruction in DR/EX } else { IF_DR_CLR = CLR_ORG; DR_EX_CLR = CLR_ORG; } ``` --- ## Flush Datapath Changes
flowchart TD J["jal in EX: PCbr_2 = 1"] --> P["PC <- ALU result (jump target)"] J --> F1["IF_DR_CLR = 1"] J --> F2["DR_EX_CLR = 1"] F1 --> N["wrongly-fetched instructions become bubbles"] F2 --> N
PCBr MUX selector must be
PCbr_2
(EX stage), NOT
PCbr_4
. Wrong stage = wrong selector = flush bug.
--- ## Decision Procedure: Forward / Stall / Flush
flowchart TD START["Dependency between producer A and consumer B?"] --> Q1{"Control change? jal/branch"} Q1 -- yes --> FLUSH["FLUSH IF/DR and DR/EX"] Q1 -- no --> Q2{"Producer is LOAD and B is immediately next?"} Q2 -- yes --> STALL["STALL 1 cycle, then FORWARD"] Q2 -- no --> Q3{"Result in EX/MEM or MEM/WB when B is in EX?"} Q3 -- yes --> FWD["FORWARD only"] Q3 -- no --> NONE["No action needed"]
--- ## Hazard Summary Table | Situation | Mechanism | |-----------|-----------| | ALU result, 1-2 instructions later | Forward (EX/MEM or MEM/WB) | | Load result used by next instruction | Stall 1 + forward | | Two loads, then combined use | Stall 1 + forward both | | `jal`/`jalr`/taken branch | Flush (clear IF/DR and DR/EX) | | No dependency | None | --- ## Counting Clock Cycles **Formula:** `cycles = n + 4 + stalls` - First instruction takes 5 cycles to WB - Each additional instruction adds 1 cycle - Add 1 cycle per load-use stall - Flushes waste fetch slots but don't add to the last-WB count | Program | n | Stalls | Cycles | |---------|---|--------|--------| | `addi; addi; add` | 3 | 0 | **7** | | `addi; sd; ld; add` | 4 | 1 | **9** | | `ld; ld; add` | 3 | 1 | **8** |
Always anchor on "when does the
last
instruction reach WB" — draw the diagram first, use the formula as a check.
--- ## Add-an-Instruction: LWU **LW** vs **LWU**: both load 32-bit word; differ in upper 32 bits of the 64-bit register. ```text LW t0, (a0): t0 = signext(mem[a0][31:0]) // copy bit 31 to bits 63..32 LWU t0, (a0): t0 = zeroext(mem[a0][31:0]) // fill bits 63..32 with 0 ``` ```text +-- signext(word) --+ word[31:0] ---+ |--MUX--> 64-bit value +-- zeroext(word) --+ ^ ZEXT (1=LWU, 0=LW) ``` - **Opcode** `0000011` (load group), **funct3 = 110** (LWU) vs `010` (LW) - **Decoder:** detect funct3 == 110, assert `ZEXT = 1`; keep normal load controls --- ## LWU Changes: Full Recipe | What | Change | |------|--------| | **Datapath / component** | Add 2-input MUX selecting sign-extend vs. zero-extend of the loaded word | | **Control line** | New `ZEXT` line; 1 for LWU, 0 for LW | | **Instruction decoder** | Detect opcode `0000011` with funct3 `110`; assert `ZEXT=1`, keep RFW=1, M2R=1, ALU = base+offset | Same recipe applies to **LBU** (load byte unsigned): funct3 = `100`, zero-extend bits 63..8. --- ## Add-an-Instruction: SWPR `swpr a0, a1` — swap two registers in one instruction ```text # without SWPR: # with SWPR: mv t0, a0 swpr a0, a1 mv a0, a1 mv a1, t0 ``` Problem: the standard RegFile writes only one register per cycle. A swap must write **two**. --- ## SWPR: Two Write Ports
flowchart LR RD0["RD0 = old a0"] --> P2["Port2: WR2=a1, WD2=RD0"] RD1["RD1 = old a1"] --> P1["Port1: WR=a0, WD=RD1"] P1 --> RF[("RegFile\n2 write ports")] P2 --> RF SW["SWPR: WE=1, WE2=1"] --> RF
- Add second write port: `WR2`, `WD2`, `WE2` - Port 1: `WR=a0`, `WD=RD1` (old a1), `WE=1` - Port 2: `WR2=a1`, `WD2=RD0` (old a0), `WE2=1` - **Collision policy:** if both ports target same register, give port 1 (`WD`) priority --- ## SWPR Changes: Full Recipe | What | Change | |------|--------| | **RegFile** | Add second write port `WR2`/`WD2`/`WE2`; collision MUX for same-register case | | **Datapath** | Wire `RD0 -> WD2`, `RD1 -> WD`; route both dest reg numbers to `WR`/`WR2` | | **Control** | New `SWPR` line asserts both `WE`/`WE2`; selects swap write-data routing | | **Decoder** | Assign unused opcode/funct encoding; assert swap controls; no immediate, no ALU op | Same RegFile machinery enables **ADDI2** (add immediate to two registers). --- ## Project 07 Rubric | Points | Test | What it exercises | |--------|------|-------------------| | 10 | `00-add-3nop`, `01-add-2nop`, `02-jal`, `03-ld` | Starter + invert RegFile clock | | 50 | `04-add-fwd` | Forwarding (FRD0/FRD1) | | 20 | `05-ld-stl` | Load-use stall + bubble | | 10 | `06-jal-fls` | Jump/branch flush | | 5 | `07-branch` | Conditional branch | | 5 | `08-fibrec` | Full program: `fibrec(10) = 55` |
Cheapest win: invert the CLK to the RegFile in DR. This alone passes
01-add-2nop
,
02-jal
, and
03-ld
.
--- ## Submission Structure
Top-level file must be named
project07.dig
at the expected path. Most common failure: files end up one folder too deep.
- The autograder result is final — no interactive grading - Fix structure issues and push immediately to avoid extra penalties **Build the Hazard Unit incrementally:** 1. Forwarding first (`04-add-fwd`, 50 pts) 2. Then load-use stall (`05-ld-stl`) 3. Then flush (`06-jal-fls`, `07-branch`) 4. Run `08-fibrec` as an integration test --- ## Common Wiring Mistakes | Mistake | Fix | |---------|-----| | EX MUX output not replacing `RD0`/`RD1` everywhere | Wire MUX output to ALU, store data, branch unit | | Forwarding priority backwards (testing `_4` before `_3`) | Always test `_3` (EX/MEM) first | | Not preserving `EN_ORG`/`CLR_ORG` in else branch | Copy original EN/CLR lines to else case | | PCBr MUX selector from `PCbr_4` instead of `PCbr_2` | Use the EX-stage flag for flush timing | --- ## Debugging Procedure 1. **Paste the `.dig` test** into your processor and run it directly in Digital 2. **`objdump`** the test to see hex instruction words alongside assembly 3. **Single-step** to find the first instruction that misbehaves 4. **Add probes** on `FRD0`, `FRD1`, `ALUR_3`, `MR_4`, EN/CLR lines to confirm whether the bug is a wrong control line or wrong value
Build incrementally: get forwarding working before touching stalls or flushes. The tests are ordered by complexity.
--- ## Key Concepts Reference | Concept | Definition | |---------|------------| | **Forwarding** | Route result from EX/MEM or MEM/WB back into EX inputs | | **Load-use hazard** | Load value needed by very next instruction before it's ready | | **Stall (bubble)** | Freeze earlier stages; inject NOP downstream | | **Control hazard** | Jump/taken branch makes already-fetched instructions invalid | | **Flush** | Clear pipeline registers so wrong instructions become bubbles | | **Clock inversion** | Write RegFile on one edge, read on the other; removes one NOP | | **Sign vs zero ext** | LW copies bit 31; LWU fills upper 32 bits with 0 | | **Second write port** | RegFile writes two registers in one cycle (SWPR) | --- ## Summary 1. **Stage suffixes (`_2`, `_3`, `_4`)** are the shared language of every Hazard Unit equation 2. **Draw the timing diagram first** — place cycles across the top, instructions down the side; read the last WB cycle 3. **Forwarding resolves ALU-to-ALU hazards** — closest producer wins; test `_3` before `_4`; only forward if `RFW` is set 4. **Load-use hazards need stall + forward** — freeze PC/IF-DR/DR-EX, clear EX/MEM; one stall suffices even for back-to-back loads 5. **Control hazards need a flush** — resolve in EX using `PCbr_2`; clear IF/DR and DR/EX 6. **Adding instructions follows a fixed recipe**: datapath, control line, component, decoder 7. **Submission**: top-level `project07.dig`, build incrementally, invert RegFile clock for easy early points