← Back to Course
# Processor Pipeline Hazards ## CS 315 Computer Architecture --- ## Overview **Pipelining** overlaps instruction execution to raise throughput — but creates **hazards**. A hazard is any situation where the pipeline overlap causes an instruction to use a value (or PC) that is not yet ready. **Four modifications** to fix hazards in Project 7: 1. Invert the register-file clock 2. Data forwarding 3. Load-use stalling 4. Control-hazard flushing --- ## Single-Cycle vs. Pipelined **Single-cycle**: one long clock period per instruction, hardware mostly idle **Pipelined**: split datapath into stages, overlap multiple instructions ```text Single-cycle: I1: [=====] I2: [=====] I3: [=====] Pipelined: I1: F D E M W I2: F D E M W I3: F D E M W <- one completes per cycle once full ```
Pipelining improves
throughput
, not latency. A single instruction still passes through all stages.
--- ## The 5-Stage Pipeline
flowchart LR A["IF\nInstruction Fetch"] --> B["DR\nDecode / RegRead"] B --> C["EX\nExecute / ALU"] C --> D["MEM\nMemory"] D --> E["WB\nWrite Back"]
| Stage | Work Performed | |-------|---------------| | **IF** | Read instruction memory; compute PC+4 | | **DR** | Decode; read source registers (RD0, RD1) | | **EX** | ALU operation; compute branch target | | **MEM** | Load from / store to data memory | | **WB** | Write result back to register file | --- ## Pipeline Registers Between every pair of stages sits a **pipeline register** that captures intermediate values so the next stage can use them next cycle.
flowchart LR IF["IF"] -->|"IF/DR"| DR["DR"] DR -->|"DR/EX"| EX["EX"] EX -->|"EX/MEM"| MEM["MEM"] MEM -->|"MEM/WB"| WB["WB"]
| Register | Carries | |----------|---------| | **IF/DR** | Instruction word, PC, PC+4 | | **DR/EX** | RD0, RD1, immediate, RR0, RR1, WR, control lines | | **EX/MEM** | ALU result, RD1, WR, control lines | | **MEM/WB** | Memory read data, ALU result, WR, RFW | --- ## Pipeline Register Controls Each pipeline register has two key control inputs: - **EN (enable)** — when `0`, the register holds its value (does not load new data). This **stalls** the stage. - **CLR (clear)** — when `1`, the register outputs are zeroed on the next edge, injecting a **bubble** (effective `nop`). This **flushes** the instruction.
Stall
= freeze the front;
Flush
= replace an instruction with a bubble.
--- ## Cycle Count Formula With no hazards or stalls, for `m` instructions in a `k`-stage pipeline: ```text total cycles = k + (m - 1) ``` For 5 stages, 3 independent instructions: ```text total = 5 + (3 - 1) = 7 cycles ``` Each **stall bubble** adds exactly 1 cycle. A flush turns an already-fetched instruction into a bubble (no extra cycle beyond the redirect). --- ## The Four Modifications | # | Modification | Hazard Fixed | Mechanism | |---|-------------|--------------|-----------| | 1 | Invert RegFile clock | Data (RAW) | Write 1st half, read 2nd half | | 2 | Forwarding | Data (RAW) | Route EX/MEM or MEM/WB result back to EX | | 3 | LD stalling | Load-use | Freeze pipeline 1 cycle | | 4 | Control-hazard flush | Jumps / branches | Update PC early; flush wrong-path |
Each modification removes some required
nop
s. Implement and test them in order.
--- ## Data Hazard: Read-After-Write (RAW) A **RAW hazard** occurs when an instruction needs a value an earlier in-flight instruction has not yet written back. ```asm addi a1, zero, 3 # writes a1 in WB addi a2, zero, 4 # writes a2 in WB add a0, a1, a2 # reads a1, a2 in DR -- too early! ``` On the bare pipeline, `add` reads DR before the `addi`s reach WB. It reads **stale values**. The starter needs **3 `nop`s** between producer and consumer to avoid this. --- ## Why 3 nops on the Starter WB is 3 stages after DR. With 3 `nop`s, the producer's WB lands just before the consumer's DR: ```text cycle: 1 2 3 4 5 6 7 8 9 10 addi a1,3 : F D E M W addi a2,4 : F D E M W nop : F D E M W nop : F D E M W nop : F D E M W add a0,a1,a2: F D E M W ``` 3 `nop`s + 3 real instructions = 6 instructions → **10 cycles** --- ## Mod 1: Invert the RegFile Clock **Idea**: WB writes on the **first half** of a cycle; DR reads on the **second half** of the same cycle. ```text |<-- one clock cycle -->| CLK: _____|_____ _____|_____ WB writes here (1st half) ~CLK: _____|_____ _____ DR reads here (2nd half) ``` The writer goes first, then the reader sees the fresh value — in the **same** cycle. **Result**: One fewer `nop` needed (3 → 2). 10 cycles → **9 cycles**. --- ## Mod 1: Effect on the Cycle Diagram ```text cycle: 1 2 3 4 5 6 7 8 9 addi a1,3 : F D E M W addi a2,4 : F D E M [W] <- WB writes a2 in 1st half of cycle 6 nop : F D E M W nop : F D E M W add a0,a1,a2: F [D] E M W <- DR reads a2 in 2nd half of cycle 6 ```
Implementation: route an inverted CLK into the RegFile's clock input in Digital. This makes test cases
01-add-2nop
,
02-jal
, and
03-ld
pass.
--- ## Mod 2: Forwarding (Bypassing) **Key insight**: The value the consumer needs already exists in the pipeline — it just has not been written back yet. When `add` is in EX (cycle 5), the producers' results are sitting in pipeline registers: ```text cycle: 1 2 3 4 5 6 7 addi a1,3 : F D E M [W] <- result in MEM/WB addi a2,4 : F D E [M] W <- result in EX/MEM add a0,a1,a2: F D [E] M W <- needs a1, a2 right NOW ``` Route those results **directly back** to the EX-stage ALU inputs — no `nop`s needed! --- ## Forwarding Datapath Insert a MUX in front of each ALU input:
flowchart TD RD0["RD0 from DR/EX"] --> M0{{"RD0 MUX\nsel = FRD0"}} ALUR3["ALUR_3\n(EX/MEM result)"] --> M0 MR4["MR_4\n(MEM/WB result)"] --> M0 M0 --> A["ALU input A"] RD1["RD1 from DR/EX"] --> M1{{"RD1 MUX\nsel = FRD1"}} ALUR3b["ALUR_3"] --> M1 MR4b["MR_4"] --> M1 M1 --> B["ALU input B"]
| Selector | Source | |----------|--------| | `0` | Register-file value (no hazard) | | `2` | `ALUR_3` from EX/MEM | | `1` | `MR_4` from MEM/WB | --- ## Hazard Unit: Forwarding Logic ```c // Forwarding selector for ALU input A (RD0) if ((RR0_2 == WR_3) && RFW_3) { FRD0 = 2; // forward EX/MEM ALU result (closer = higher priority) } else if ((RR0_2 == WR_4) && RFW_4) { FRD0 = 1; // forward MEM/WB result } else { FRD0 = 0; // no hazard: use register file } // FRD1 is identical but uses RR1_2 ```
Priority rule
: Check stage 3 (EX/MEM) before stage 4 (MEM/WB). The closest producer holds the most recent value.
--- ## Forwarding Priority: Why It Matters ```asm addi a0, zero, 3 # writes a0 = 3 (older, in MEM/WB) addi a0, zero, 4 # writes a0 = 4 (newer, in EX/MEM) add a0, a0, a0 # correct result: 4 + 4 = 8 ``` - Correct logic (stage 3 first): `FRD0 = 2` → forwards `4` → `4 + 4 = 8` - Swapped logic (stage 4 first): `FRD0 = 1` → forwards `3` → `3 + 3 = 6` **WRONG** The **closest** (most recent) producer must always win. This fix is test `04-add-fwd` — worth **50 pts**. --- ## Mod 3: Load-Use Hazard Forwarding works for ALU results (available at end of EX). But a **load** (`ld`) produces its value in **MEM** — one stage too late to forward directly. ```text cycle: 1 2 3 4 5 6 ld a2,(a0) : F D E [M] W <- a2 ready at end of MEM (cycle 4) addi a0,a2,: F D [E] ... <- needs a2 at start of EX (cycle 4): TOO EARLY ``` The consumer's EX and the load's MEM are in the **same** cycle. The data does not exist yet — we must **stall one cycle**. --- ## How to Stall The Hazard Unit does two things simultaneously: 1. **Freeze** the front of the pipeline (EN = 0 on PC, IF/DR, DR/EX) 2. **Inject a bubble** forward (CLR = 1 on EX/MEM) ```text cycle: 1 2 3 4 5 6 7 ld a2,(a0) : F D E M W addi a0,a2,: F D D E M W <- DR repeated (stalled) ^ bubble inserted; PC/IF_DR/DR_EX held ``` After the stall, the load value is in MEM/WB and ordinary forwarding delivers it to `addi`'s EX. --- ## Hazard Unit: Load-Stall Logic ```c // Load-use hazard: EX/MEM is a load writing a reg that DR is about to read if (RFW_3 && MLD_3 && ((RR0_2 == WR_3) || (RR1_2 == WR_3))) { PC_EN = 0; // freeze PC IF_DR_EN = 0; // freeze IF/DR DR_EX_EN = 0; // freeze DR/EX EX_MEM_CLR = 1; // inject bubble into EX/MEM } else { PC_EN = EN_ORG; // preserve original behavior IF_DR_EN = 1; DR_EX_EN = 1; EX_MEM_CLR = CLR_ORG; } ```
MLD_3
= "memory load" control line in EX/MEM. Must preserve
EN_ORG
/
CLR_ORG
when no stall, so single-step debugging still works.
--- ## Mod 4: Control Hazards Jumps (`jal`, `jalr`) and taken branches change the PC, but the pipeline keeps fetching **sequential instructions** — which are on the wrong path. ```asm main: li a0, 3 jal foo # jumps to foo unimp # should NOT execute (wrong path!) foo: addi a0, a0, 4 # a0 should be 7 ``` The bare starter needs **4 `nop`s** after a `jal` to empty those wrong-path slots. --- ## Resolving Control Hazards Two coordinated changes: 1. **Update PC early** — compute the target in EX (from the ALU result), selected by `PCbr_2` 2. **Flush wrong-path instructions** in IF/DR and DR/EX ```text cycle: 1 2 3 4 jal foo : F D E <- PC redirected in EX unimp(wrong): F D *flush* <- DR/EX cleared to bubble ???? (wrong): F *flush* <- IF/DR cleared to bubble addi a0,..: F D E <- correct target fetched ``` Only **2 instructions** fetched on the wrong path — both flushed. --- ## Hazard Unit: Control-Hazard Flush Logic ```c // Control hazard: jump or taken branch detected in EX if (PCbr_2 == 1) { IF_DR_CLR = 1; // flush instruction in IF/DR DR_EX_CLR = 1; // flush instruction in DR/EX } else { IF_DR_CLR = CLR_ORG; // preserve original behavior DR_EX_CLR = CLR_ORG; } ```
flowchart TD A["Instruction in EX"] --> B{"PCbr_2 == 1?\njump or taken branch"} B -- yes --> C["PC = ALU result from EX"] C --> D["IF_DR_CLR = 1\nDR_EX_CLR = 1"] B -- no --> E["PC = PC + 4\nno flush"]
--- ## The Complete Hazard Unit One combinational block that observes the pipeline and produces all control signals:
flowchart LR subgraph Inputs I1["RR0_2 / RR1_2"] I2["WR_3 / WR_4"] I3["RFW_3 / RFW_4"] I4["MLD_3"] I5["PCbr_2"] I6["EN_ORG / CLR_ORG"] end HU["Hazard Unit"] subgraph Outputs O1["FRD0 / FRD1"] O2["PC_EN, IF_DR_EN, DR_EX_EN"] O3["EX_MEM_CLR"] O4["IF_DR_CLR, DR_EX_CLR"] end I1 --> HU I2 --> HU I3 --> HU I4 --> HU I5 --> HU I6 --> HU HU --> O1 HU --> O2 HU --> O3 HU --> O4
--- ## Hazard Unit Output Summary | Signal | Type | Purpose | |--------|------|---------| | `FRD0`, `FRD1` | Forwarding sel. | Choose ALU input source (0/1/2) | | `PC_EN` | Enable | Freeze PC during load stall | | `IF_DR_EN`, `DR_EX_EN` | Enable | Freeze pipeline regs during load stall | | `EX_MEM_CLR` | Clear | Inject bubble for load stall | | `IF_DR_CLR`, `DR_EX_CLR` | Clear | Flush wrong-path for control hazard | --- ## Performance: nop Count vs. Cycles | Version | Instructions | Cycles | |---------|-------------|--------| | Starter (3 nops) | 2 + 3 + 1 = 6 | 10 | | Inverted clock (2 nops) | 2 + 2 + 1 = 5 | 9 | | Forwarding (0 nops) | 2 + 0 + 1 = 3 | 7 | Formula: `cycles = k + (m - 1) = 5 + (m - 1)` Each modification removes `nop`s and reduces `m`, lowering the total cycle count. --- ## Project 7 Test Sequence | Order | Modification | Test | Points | |-------|-------------|------|--------| | 1 | Invert RegFile clock | `00`, `01`, `02`, `03` | 10 | | 2 | Forwarding (FRD0/FRD1 MUXes) | `04-add-fwd` | 50 | | 3 | Load stalling | `05-ld-stl` | 20 | | 4 | Control-hazard flush (jump) | `06-jal-fls` | 10 | | 4b | Control-hazard flush (branch) | `07-branch` | 5 | | — | Full program | `08-fibrec` | 5 |
Implement in order — each mod makes a specific test pass and builds on the previous.
--- ## Hazard Identification Quick Reference | Situation | Hazard Type | Fix | |-----------|-------------|-----| | `add t0,t1,t2` then `sub t3,t0,t4` | RAW data hazard | Forwarding from EX/MEM | | `ld t0,(a0)` then `add t3,t0,t4` | Load-use hazard | 1-cycle stall + forward | | `jal foo` then wrong-path instrs | Control hazard | Flush IF/DR and DR/EX | | `beq t0,t1,L` (taken) then wrong-path | Control hazard | Flush IF/DR and DR/EX | --- ## Key Formulas and Conditions **Forwarding** (check stage 3 first): ```c FRD0 = (RR0_2==WR_3 && RFW_3) ? 2 : (RR0_2==WR_4 && RFW_4) ? 1 : 0; ``` **Load stall** condition: ```c stall = RFW_3 && MLD_3 && (RR0_2==WR_3 || RR1_2==WR_3); ``` **Control flush** condition: ```c flush = (PCbr_2 == 1); ``` --- ## Summary 1. A **5-stage pipeline** (IF, DR, EX, MEM, WB) overlaps instructions to raise throughput; pipeline registers carry values between stages. 2. **Hazards** arise from overlap: a data hazard reads a not-yet-written value; a control hazard fetches wrong-path instructions. 3. **Invert the RegFile clock** — WB writes in the first half, DR reads in the second half; removes 1 `nop` (10 → 9 cycles). 4. **Forwarding** — route EX/MEM (`ALUR_3`) or MEM/WB (`MR_4`) results back to EX via `FRD0`/`FRD1` MUXes; closest producer wins priority. 5. **Load-use stall** — freeze PC/IF_DR/DR_EX (`EN=0`) and inject a bubble (`EX_MEM_CLR=1`) for one cycle when a load feeds the immediately following instruction. 6. **Control-hazard flush** — redirect PC from EX (using `PCbr_2`) and clear IF/DR and DR/EX to bubbles; no `nop`s needed in source code. 7. **All logic lives in one Hazard Unit** — combinational, reads register numbers and control lines, preserves original `EN`/`CLR` when no hazard is active.