Lab: Project 07 and Final Review¶
Overview¶
This hands-on lab wrapped up the semester by combining two goals: finishing the Project 07 pipelined-processor Hazard Unit and reviewing for the final exam. We worked through the Lab 11 exam-like problems on paper, drew pipeline timing diagrams cycle-by-cycle for forwarding, stalling, and flushing, and reasoned about how to extend the single-cycle and pipelined datapaths with new instructions (LWU, SWPR). The emphasis throughout was procedural: how to trace a program through the five stages, how to count clock cycles, and how to decide which hazard mechanism (forward, stall, or flush) each dependency requires.
Learning Objectives¶
- Draw a cycle-by-cycle pipeline timing diagram (IF, DR, EX, MEM, WB) for a short RISC-V program
- Count the total clock cycles needed for a program on a fully hazard-managed pipeline
- Decide when a dependency needs forwarding versus a stall versus a flush
- Explain why inverting the RegFile clock removes one
nopand how forwarding removes more - Trace the Hazard Unit logic (FRD0/FRD1, stall enables, flush clears) against the Project 07 spec
- Extend the single-cycle datapath for sign-vs-zero extension (LWU) and for a two-register write (SWPR)
- Identify and fix the most common Project 07 submission and wiring mistakes
Prerequisites¶
- Project 06 single-cycle RISC-V processor (datapath, control, instruction decoder)
- The five-stage pipeline and pipeline registers (IF/DR, DR/EX, EX/MEM, MEM/WB)
- RISC-V instruction formats and the meaning of
addi,add,ld,sd,jal,beq - Two's-complement sign extension versus zero extension
- Digital (the logic simulator): MUXes, comparators, RAM, ROM, registers with EN/CLR
1. Session Roadmap and Final Exam Format¶
This was the last working session before the final. The plan was to use the Lab 11 exam-like problems as the review backbone, then return to Project 07 mechanics so everyone could finish the Hazard Unit.
What the final exam will look like:
- Color circuit diagrams. Expect to read and annotate a datapath drawn in color. Lines you add (new datapath wires, MUXes, control lines) should be drawn clearly, the way the lab solutions were drawn.
- Cycle counting and pipelining. Given a short program and a fully hazard-managed pipeline, count the clock cycles and identify every forward, stall, and flush.
- Processor design / "add an instruction." Given the single-cycle or pipelined datapath, describe the datapath, control-path, component, and instruction-decoder changes needed to support a new instruction.
- Sum-of-products and digital design. Truth tables to boolean equations to gate-level circuits (majority, XOR3, Max3, Sort2).
The project itself is cumulative in how it is weighted at the course level: roughly 25% pre-midterm content and 75% post-midterm content. The autograder is run as-is; any grading adjustments are handled afterward.
flowchart LR
A[Lab 11 problems] --> B[Pipeline timing diagrams]
B --> C[Hazard reasoning: forward / stall / flush]
C --> D[Project 07 Hazard Unit]
A --> E[Add-an-instruction: LWU, SWPR]
E --> F[Final exam prep]
C --> F
style B fill:#f9f,stroke:#333,stroke-width:2px
style C fill:#f9f,stroke:#333,stroke-width:2px
2. The Five-Stage Pipeline: Vocabulary We Will Reuse All Session¶
Every timing diagram in this lab uses the same five stages. Memorize the abbreviations because the diagrams are written entirely in them.
| Stage | Letter | What happens |
|---|---|---|
| Instruction Fetch | I (IF) | Read the instruction word from instruction memory at PC; compute PC+4 |
| Decode / Register read | D (DR) | Decode the instruction; read RR0/RR1 from the RegFile into RD0/RD1; build the immediate |
| Execute | E (EX) | ALU computes the result; Branch Unit compares; branch/jump target computed |
| Memory | M (MEM) | Load or store to data RAM (ld/sd); other instructions pass through |
| Write Back | W (WB) | Write the result back into the RegFile (WR/WD/WE) |
Between every pair of stages sits a pipeline register that carries that instruction's signals forward one stage per clock:
A signal name in Project 07 carries a stage suffix: _2 means the value is in the DR/EX boundary (decode side), _3 means EX/MEM, _4 means MEM/WB. So RR0_2 is the read-register-0 selector of the instruction currently entering EX, WR_3 is the destination register of the instruction in MEM, and RFW_4 is the register-file-write flag of the instruction in WB.
flowchart LR
PC[PC] --> IFDR[IF/DR reg]
IFDR --> DREX[DR/EX reg]
DREX --> EXMEM[EX/MEM reg]
EXMEM --> MEMWB[MEM/WB reg]
MEMWB --> RF[(RegFile)]
RF -. read RD0/RD1 .-> DREX
3. Reading a Pipeline Timing Diagram (Worked: Three addi/add)¶
The first board we drew is Question 7, part (4) from Lab 11. The program runs on a complete Project 07 solution (clock inversion + forwarding + stalling + flushing):
A timing diagram places cycles across the top and one instruction per row. Each instruction occupies one stage per cycle, shifted right by one as the next instruction follows it down the pipe.
cyc1 cyc2 cyc3 cyc4 cyc5 cyc6 cyc7
addi a1 | I | D | E | M | (W) | | |
addi a2 | | I | D | (E) | (M) | W | |
add a0 | | | I | D | (E) | M | W |
The add finishes its WB in cycle 7. That circled (7) was the answer on the board.
Why forwarding, not stalling, is enough here¶
The add reads a1 and a2 in its DR stage (cycle 4), but the producers of a1 and a2 (the two addis) have not yet written back. Look at the vertical alignment in cycle 5: the second addi is in MEM (M) and the add is entering EX (E). The first addi is in WB (W).
- The
addneedsa1: produced byaddi a1(now in WB). Forward from the WB/MEM-WB side into EX. - The
addneedsa2: produced byaddi a2(now in MEM). Forward from the EX/MEM (ALU result) side into EX.
Both values exist somewhere in the pipeline by the time the add is in EX, so forwarding alone resolves the hazard. The circled forwarding arrows on the board ran from the addi results back down to the add's EX inputs. No stall is needed, no bubble is inserted.
Part (5) answer: No flushes and no stalls are needed for this program — only forwarding. The pipeline runs at full rate.
Contrast with the starter pipeline (no Hazard Unit): the
addwould read the RegFile in cycle 3 before eitheraddihas written back, getting stalea1/a2(both0if those registers started at 0), soa0would be0. That is Question 7 parts (1)-(3): the wrong result is0, and you would fix it by insertingnops between the instructions.
4. The Forwarding Hazard Unit (Project 07, test 04-add-fwd)¶
Forwarding is the highest-value piece of Project 07 (50 of 100 points). The idea: instead of waiting for a value to travel all the way to WB and back into the RegFile, route it directly from a later stage back into the EX inputs.
Datapath additions¶
- Bring two already-computed values back to EX:
ALUR_3— the ALU result sitting in the EX/MEM register (the instruction one ahead, in MEM).MR_4— the MEM/WB MUX output (the instruction two ahead, in WB).- Add two 3-input MUXes in the EX stage, one per operand:
- RD0 MUX: inputs
{ RD0, ALUR_3, MR_4 }, selectorFRD0. - RD1 MUX: inputs
{ RD1, ALUR_3, MR_4 }, selectorFRD1. - The MUX output feeds everywhere the original
RD0/RD1used to connect (ALU input, store data, branch unit, etc.).
+-----------+
RD0 ---->|0 |
ALUR_3 -->|1 MUX |---> to ALU A (and wherever RD0 went)
MR_4 --->|2 |
+-----------+
^
FRD0 (from Hazard Unit)
Hazard Unit logic (for FRD0; FRD1 is symmetric with RR1_2)¶
if ((RR0_2 == WR_3) && (RFW_3)) {
FRD0 = 2; // forward ALU result from EX/MEM (closest producer)
} else if ((RR0_2 == WR_4) && (RFW_4)) {
FRD0 = 1; // forward from MEM/WB
} else {
FRD0 = 0; // no hazard: use the value read from the RegFile
}
Two things to internalize:
- Match the read register to a later instruction's write register, and only forward if that later instruction actually writes (
RFW). An instruction that does not write the RegFile must never be a forwarding source. - Closest wins. Test EX/MEM (
_3) before MEM/WB (_4). The nearer instruction holds the most recent value.
The board example for "closest wins":
When the add is in EX, the addi a0, zero, 4 is in MEM (its result is ALUR_3 = 4) and addi a0, zero, 3 is in WB (MR_4 = 3). Both match a0, so the priority order forwards ALUR_3 = 4. Getting the priority backwards is the single most common forwarding bug.
flowchart TD
A["add a0, a0, a0 in EX wants a0"] --> B{"RR0_2 == WR_3 and RFW_3?"}
B -- yes --> C["FRD0 = 2 -> forward ALUR_3 (=4)"]
B -- no --> D{"RR0_2 == WR_4 and RFW_4?"}
D -- yes --> E["FRD0 = 1 -> forward MR_4 (=3)"]
D -- no --> F["FRD0 = 0 -> use RD0 from RegFile"]
5. The Load-Use Stall (Worked: sd/ld/add)¶
The second board is Question 7, parts (6)-(9). This is the case forwarding cannot fully solve, because a load's value is not ready until the end of MEM — too late to forward into the very next instruction's EX.
Here is the diagram we drew. B (in red on the board) marks the inserted bubble, and the row "wraps" because the dependent add is held back one cycle:
c1 c2 c3 c4 c5 c6 c7 c8 c9
addi a1 | I | D | E | M | W | | | | |
sd a1 | | I | D | E | M | W | | | |
ld a2 | | | I | D | E | M | W | | |
add a0 | | | | I | D |(B) | E | M | W | <- stalled 1 cycle
The add reaches WB in cycle 9 — that is the circled (9) answer.
Walking the timeline¶
ld a2producesa2at the end of its MEM stage (cycle 6).- The
addwould normally be in EX in cycle 5, but it needsa2then — too early. Forwarding from EX/MEM does not help because the load result is not in EX/MEM yet; it is still being read from RAM. - So we stall one cycle: freeze the
add(and everything behind it) in place for cycle 6, inserting a bubble. Now theaddis in EX in cycle 7, and by thena2is available in MEM/WB (MR_4) and can be forwarded.
So this program needs both a stall (one cycle) and forwarding (the load result is then forwarded from WB into the add's EX).
Answers to parts (6)-(9)¶
- (6) Cycles: 9.
- (7) Flushing: No. There is no taken branch or jump, so nothing in flight must be discarded.
- (8) Stalling: Yes — exactly one cycle, caused by the load-use dependency between
ld a2andadd a0, a2, a2. - (9) Forwarding: Yes — after the one-cycle stall, the loaded
a2is forwarded from the MEM/WB side into theadd's EX inputs. (addi a1->sd a1is also a forward ofa1into the store.)
Hazard Unit logic for the load-use stall (05-ld-stl)¶
Detect "the instruction in EX is a load that writes a register which the instruction in DR reads," then stall the front of the pipe and clear the EX/MEM register (inject a NOP downstream):
if ((RFW_3 == 1) && (MLD_3 == 1) &&
((RR0_2 == WR_3) || (RR1_2 == WR_3))) {
PC_EN = 0; // freeze PC
IF_DR_EN = 0; // freeze IF/DR
DR_EX_EN = 0; // freeze DR/EX
EX_MEM_CLR = 1; // flush EX/MEM -> inject a bubble
} else {
PC_EN = EN_ORG; // preserve original control
IF_DR_EN = 1;
DR_EX_EN = 1;
EX_MEM_CLR = CLR_ORG;
}
The key discipline from the spec: preserve the original EN and CLR lines in the else branch (EN_ORG, CLR_ORG) so manual single-stepping and debugging still behave.
flowchart LR
L["ld a2 in EX (MLD_3=1)"] -->|produces a2 end of MEM| M[MEM]
A["add a2 in DR wants a2"] -->|too early to forward| STALL[Stall 1 cycle]
STALL --> FWD["then forward a2 from MEM/WB into EX"]
6. Back-to-Back Loads Need Only a Single-Cycle Stall¶
The third board (bottom of the page) addressed a subtle question: if we have two loads in a row and then a use, how many stalls?
Intuition might say "two stalls, one per load." But only one stall is needed. The diagram showed the add stalled exactly one cycle (the red E box and the red empty bubble boxes mark where the stall lands):
c1 c2 c3 c4 c5 c6 c7 c8 c9
ld a2 | I | D | E | M | W | | | | |
ld a3 | | I | D | E | M | W | | | |
add a0 | | | I | D |(--)| E | M | W | | <- one bubble
Why one stall suffices:
ld a2writes back in cycle 5;ld a3writes back in cycle 6.- After a single stall, the
addis in EX in cycle 6. By then: a2is already in the RegFile / MEM-WB region and can be forwarded.a3is exactly arriving fromld a3's MEM result and can be forwarded too.- The clock-inversion assumption (RegFile written on one clock edge, read on the other) means the most recent load's write-back lines up with the dependent read after just one bubble. No second stall is required.
This is exactly the kind of timing reasoning the final expects: don't add stalls mechanically — line up the producer's write-back with the consumer's read and add only as many bubbles as the alignment actually demands.
7. Control Hazards: Flushing on Jumps and Branches¶
The remaining hazard class is control hazards: when jal/jalr/beq/bne/blt/bge change the PC, the instructions already fetched behind them must not commit.
In the Project 07 starter, jal needs four nops after it. With the Hazard Unit we instead resolve the jump in EX and flush the two instructions that were fetched behind it.
Test 06-jal-fls:
Datapath/control changes from the spec:
- The second input to the PCBr MUX comes from the ALU result in EX (not from the
MR_4MUX). - The PCBr MUX selector comes from
PCbr_2(the EX-stage flag), notPCbr_4. - When a jump/taken-branch is detected in EX, clear the two pipeline registers feeding EX so the wrongly-fetched instructions become bubbles:
if (PCbr_2 == 1) {
IF_DR_CLR = 1; // flush the instruction in IF/DR
DR_EX_CLR = 1; // flush the instruction in DR/EX
} else {
IF_DR_CLR = CLR_ORG;
DR_EX_CLR = CLR_ORG;
}
flowchart TD
J["jal in EX: PCbr_2 = 1"] --> P["PC <- ALU result (target)"]
J --> F1["IF_DR_CLR = 1 (flush)"]
J --> F2["DR_EX_CLR = 1 (flush)"]
F1 --> N["wrongly-fetched 'unimp' becomes a bubble"]
F2 --> N
Test 07-branch (a simple conditional branch) passes for free once this flush logic is correct, because a taken branch is the same control-hazard mechanism. Test 08-fibrec (computing fibrec(10) = 55) is the full program that only runs when forwarding, stalling, and flushing all work together.
Historical note: branch delay slots¶
Early RISC processors avoided this hardware entirely by defining the instruction right after a branch to always execute (the branch delay slot); the compiler filled it with useful work or a nop. Modern designs prefer hardware flushing/prediction over exposing the delay slot in the ISA.
8. Deciding Forward vs. Stall vs. Flush (Decision Procedure)¶
This is the heart of the exam. Given any program on the full pipeline, classify each dependency:
flowchart TD
START["Dependency between instr A (producer) and B (consumer)?"] --> Q1{Control change? jal/jalr/taken branch}
Q1 -- yes --> FLUSH["FLUSH the instructions fetched behind it"]
Q1 -- no --> Q2{Is the producer a LOAD and B is the very next instruction needing it?}
Q2 -- yes --> STALL["STALL one cycle, then FORWARD"]
Q2 -- no --> Q3{Producer's result available in EX/MEM or MEM/WB when B is in EX?}
Q3 -- yes --> FWD["FORWARD only"]
Q3 -- no --> NONE["No action (no hazard)"]
A compact rule set:
| Situation | Mechanism | Why |
|---|---|---|
| ALU result used by a later non-adjacent instruction | None / already in RegFile | Value written back in time |
| ALU result used by next 1-2 instructions | Forward (EX/MEM or MEM/WB) | Value exists in pipeline before consumer's EX |
| Load result used by the immediately following instruction | Stall 1 + forward | Load value not ready until end of MEM |
| Two loads then a combined use | Stall 1 + forward both | One bubble aligns both producers |
jal/jalr/taken branch |
Flush (clear IF/DR and DR/EX) | Fetched-behind instructions must not commit |
9. Counting Clock Cycles¶
Cycle counting is a guaranteed exam question. The procedure:
- Base latency. The first instruction takes 5 cycles to reach WB; each subsequent instruction adds 1 cycle if there are no hazards. For
nindependent instructions:cycles = n + 4. - Add a cycle per stall (bubble). Each load-use stall adds 1.
- Flushes do not add cycles to the program's own count in the simple sense — they discard wrongly-fetched instructions, but the target instruction still flows through; the "wasted" fetch slots are the cost.
Worked counts from this session:
| Program | Independent instrs | Stalls | Cycles to last WB |
|---|---|---|---|
addi a1; addi a2; add a0 |
3 | 0 (forward only) | 3 + 4 = 7 |
addi a1; sd a1; ld a2; add a0,a2,a2 |
4 | 1 (load-use) | 4 + 4 + 1 = 9 |
ld a2; ld a3; add a0,a2,a3 |
3 | 1 (single, back-to-back loads) | 3 + 4 + 1 = 8 |
Always anchor on "when does the last instruction reach WB," and read it straight off the diagram. The formula is a check, not a substitute for drawing the diagram.
10. Add-an-Instruction: LWU (Sign vs. Zero Extension)¶
Lab 11 Question 5(5) asks how to add LWU (load word unsigned) to the single-cycle processor. LW and LWU both read a 32-bit word from memory into a 64-bit register; the only difference is the upper 32 bits:
- LW: sign-extend the loaded word (copy bit 31 into bits 63..32).
- LWU: zero-extend the loaded word (fill bits 63..32 with 0).
What changes¶
- Datapath / components. The load path that extends a 32-bit value to 64 bits currently does sign extension only. Add the ability to select between sign-extend and zero-extend. The clean way: feed both a sign-extended and a zero-extended version into a small MUX and pick with a new control line (call it
ZEXTor extendMSZ).
+-- signext(word) --+
word[31:0] ---+ |--MUX--> 64-bit value to RegFile
+-- zeroext(word) --+
^
ZEXT (1 for LWU, 0 for LW)
- Control path. Add the new
ZEXT(or extendedMSZ) control line, asserted only for LWU. - Instruction decoder. LWU is opcode
0000011(the load group) withfunct3 = 110. Add comparators to detectfunct3 == 110, and have that driveZEXT = 1while still asserting the normal load controls (M2R/WDsel select RAM, RFW = 1, ALU computes base+offset).
Variation: LBU (load byte unsigned)¶
The same recipe applies to LBU (Lab 11 Q6-5): read 8 bits, then zero-extend bits 63..8. Reuse the same ZEXT MUX on the byte-load path; decode funct3 == 100 in the load group.
11. Add-an-Instruction: SWPR (Two Register Writes in One Cycle)¶
Lab 11 Question 5(6) asks for SWPR (swap registers): swpr a0, a1 swaps the two registers in one instruction, replacing the usual three-mv idiom.
The hard part: a swap must write two registers in the same clock cycle, but the standard RegFile writes only one (single WR/WD/WE).
RegFile changes¶
- Add a second write port:
WR2(which register),WD2(the data),WE2(enable). - Internally, a swap routes
a1's old value intoa0anda0's old value intoa1: - Port 1:
WR = a0,WD = RD1(the olda1),WE = 1. - Port 2:
WR2 = a1,WD2 = RD0(the olda0),WE2 = 1. - Collision policy. If both ports somehow target the same register (e.g.
swpr a0, a0), define a priority — giveWD(port 1) priority overWD2— or disallow it. Add a MUX inside the RegFile decode that resolves which write data wins per destination.
Datapath / control / decoder changes¶
- Datapath. Add the second write path: wire
WR2/WD2/WE2from the decoder and from the read-data outputs (WD2 = RD0, whileWD = RD1). - Control. A new
SWPRcontrol line that assertsWE2and selects the swap routing on the write-data MUXes. - Instruction format / decoder. SWPR is a new instruction, so you must choose an encoding (an unused opcode/funct combination) that carries two register fields. The decoder detects it and asserts the swap controls;
RFW/WEand the newWE2are both 1, and no immediate or ALU operation is needed.
Variation: ADDI2 (add immediate to two registers)¶
Lab 11 Q6-6 (addi2 a0, a0, 8 updates both a0 and a1) uses the same two-write-port machinery: port 1 writes rd = rd + imm, port 2 writes rd+1 = (rd+1) + imm. You additionally need a second ALU (or to reuse the adder) to compute the second sum, plus a read of rd+1.
flowchart LR
RD0["RD0 = old a0"] --> P2["Port2: WR2=a1, WD2=RD0"]
RD1["RD1 = old a1"] --> P1["Port1: WR=a0, WD=RD1"]
P1 --> RF[(RegFile, 2 write ports)]
P2 --> RF
SW["SWPR control: WE=1, WE2=1"] --> RF
12. Project 07 Submission Mechanics and Common Mistakes¶
We closed the lab on logistics because several repos were failing the autograder for avoidable reasons.
Submission structure. The top-level processor circuit must be named project07.dig and committed at the expected path. The most common failure was a nested final directory — the files ended up one folder too deep, so the autograder could not find project07.dig. Fix: push a corrected repository immediately (ideally same day) to avoid extra penalties. There is no interactive grading for Project 07, so the autograder result is what counts; get the structure right.
Rubric reminder (where the points are):
| Points | Test(s) | What it exercises |
|---|---|---|
| 10 | 00-add-3nop, 01-add-2nop, 02-jal, 03-ld |
Starter + invert RegFile clock |
| 50 | 04-add-fwd |
Forwarding (FRD0/FRD1) |
| 20 | 05-ld-stl |
Load-use stall + bubble |
| 10 | 06-jal-fls |
Jump/branch flush |
| 5 | 07-branch |
Conditional branch (free once flush works) |
| 5 | 08-fibrec |
Full program, fibrec(10) = 55 |
The single cheapest win. Test 01-add-2nop passes by inverting the CLK input to the RegFile in the DR stage. WB then writes on the first half of the cycle and DR reads on the second half, removing one of the three nops. Tests 02-jal and 03-ld also start passing once you do this.
Debugging procedure (from the lab):
- Build the Hazard Unit incrementally — get forwarding working before touching stalls or flushes.
- Paste the
.digtest into your processor so you can run it directly in Digital and see where it diverges. objdumpthe test so you can see hex instruction words next to assembly, then single-step to find the first instruction that misbehaves.- Add probes on the datapath (
FRD0,FRD1,ALUR_3,MR_4, the EN/CLR lines) to confirm whether the bug is a wrong control line or a wrong value.
Common wiring mistakes:
- Forgetting that the EX MUX output must replace
RD0/RD1everywhere, not just at the ALU input. - Getting the forwarding priority backwards (must test
_3before_4). - Not preserving
EN_ORG/CLR_ORGin the stall/flush logic, which breaks manual stepping. - Driving the PCBr MUX selector from
PCbr_4instead ofPCbr_2for the flush case.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Pipeline register | A register between two stages that carries an instruction's signals forward one stage per clock | DR/EX, EX/MEM, MEM/WB |
| Data hazard | A consumer reads a register before the producer has written it back | add a0,a1,a2 right after addi a1,... |
| Forwarding | Routing a result from EX/MEM or MEM/WB directly back into EX inputs | FRD0 = 2 forwards ALUR_3 |
| Load-use hazard | A load's value is needed by the very next instruction's EX, but isn't ready until end of MEM | ld a2 then add a0,a2,a2 |
| Stall (bubble) | Freezing earlier stages one cycle and injecting a NOP downstream | one bubble before a load-dependent add |
| Control hazard | A jump/taken branch makes already-fetched instructions invalid | instruction after jal |
| Flush | Clearing pipeline registers so wrongly-fetched instructions become bubbles | IF_DR_CLR=1, DR_EX_CLR=1 on jal |
| Clock inversion (RegFile) | Writing on one clock edge and reading on the other to remove one NOP | passes 01-add-2nop |
| Sign vs. zero extension | Fill upper bits with the sign bit (LW) vs. with zeros (LWU) | lwu zero-fills bits 63..32 |
| Second write port | RegFile able to write two registers in one cycle | swpr a0, a1 uses WR/WD and WR2/WD2 |
Practice Problems¶
Problem 1: Cycle count with forwarding only¶
On a complete Project 07 pipeline, how many cycles to the last write-back for:
Click to reveal solution
- Each `add` depends on the immediately previous instruction's ALU result. - When `add t1` is in EX (c4), `addi t0`'s result is in EX/MEM -> forward `ALUR_3`. - When `add t2` is in EX (c5), `add t1`'s result is in EX/MEM -> forward `ALUR_3`. - All ALU-to-ALU dependencies; **forwarding only, no stalls**. **Cycles = 3 + 4 = 7.** No flushes, no stalls.Problem 2: Classify each dependency¶
For the program below on a full pipeline, label each dependency as forward, stall+forward, or flush:
Click to reveal solution
- `ld a0` -> `add a1, a0, a0`: **stall 1 + forward**. Load result isn't ready until end of MEM, and `add` is the next instruction needing it. - `add a1` -> `beq a1, a1, ...`: this is a register dependency on `a1` for the branch comparison; **forward** `a1` into the branch unit / EX. - `beq` taken -> `addi a2` was fetched behind it: **flush**. The branch jumps to `done`, so `addi a2, zero, 9` must be discarded. So: one stall (load-use), forwarding (load result and `a1` to the branch), and one flush (taken branch).Problem 3: Forwarding priority¶
When add a0, a0, a0 reaches EX in this snippet, which value is forwarded for the first operand, and via which selector value?
Click to reveal solution
When `add` is in EX: - `addi a0, zero, 4` is in MEM, so `ALUR_3 = 4`, `WR_3 = a0`, `RFW_3 = 1`. - `addi a0, zero, 3` is in WB, so `MR_4 = 3`, `WR_4 = a0`, `RFW_4 = 1`. Both match `RR0_2 == a0`. The Hazard Unit tests `_3` first: **`FRD0 = 2`, forwarding the value `4`.** Result `a0 = 4 + 4 = 8`. Forwarding the WB value (`3`) instead would be the classic priority bug.Problem 4: Why one stall, not two, for back-to-back loads¶
Explain, with a timing argument, why ld a2; ld a3; add a0, a2, a3 needs only a single stall.
Click to reveal solution
- The `add` needs both `a2` (from `ld a2`) and `a3` (from `ld a3`). - The binding constraint is the *most recent* producer, `ld a3`, whose value is ready at the end of its MEM (c5). - After **one** bubble, the `add` is in EX in c6. At that point `a3` can be forwarded from `ld a3`'s MEM/WB result, and `a2` (ready a cycle earlier) is also available to forward. - A second stall would only delay a value that is already available, so one bubble is sufficient. **One stall.** Last WB in c8 -> 3 + 4 + 1 = **8 cycles**.Problem 5: Add LWU to the datapath¶
List the datapath, control, and decoder changes needed to support lwu t0, (a0).
Click to reveal solution
- **Datapath/components:** On the 32-bit load path, provide both a sign-extended and a zero-extended version of the loaded word, and add a 2-input MUX to choose between them. - **Control:** Add a new control line `ZEXT` (or extend `MSZ`) that selects zero-extend; assert it only for LWU. - **Decoder:** Detect the load opcode `0000011` with `funct3 = 110` using comparators; on a match, set `ZEXT = 1` while keeping the normal load controls (RFW = 1, M2R/WDsel selects RAM output, ALU computes base + offset). LW (`funct3 = 010`) leaves `ZEXT = 0`. The only behavioral difference from LW is the upper-32-bit fill (zeros vs. sign bit).Problem 6: Add SWPR to the RegFile¶
Describe how SWPR (swpr a0, a1) writes two registers in one cycle, including the collision policy.
Click to reveal solution
- **RegFile:** Add a second write port `WR2`/`WD2`/`WE2`. - **Routing for the swap:** - Port 1: `WR = a0`, `WD = RD1` (the old value of `a1`), `WE = 1`. - Port 2: `WR2 = a1`, `WD2 = RD0` (the old value of `a0`), `WE2 = 1`. - **Datapath:** Wire `RD0` to `WD2` and `RD1` to `WD`; route the two destination register numbers to `WR` and `WR2`. - **Control:** A `SWPR` line asserts both `WE`/`WE2` and selects the swap write-data routing. - **Decoder:** Choose an unused opcode/funct encoding that carries two register fields; the decoder asserts the swap controls (no immediate, no ALU op). - **Collision policy:** If both ports target the same register (`swpr a0, a0`), give port-1 (`WD`) priority via a MUX in the RegFile write decode, or disallow the case. Without this, the result on a self-swap is undefined.Further Reading¶
- Project 07 spec: /assignments/project07/
- Lab 11 exam-like problems: /assignments/lab11/
- Processor design overview (branching, data memory, debugging): /guides/processor-part-3/
- Source notes (handwritten timing diagrams): "/notes/CS315-01 2025-12-03 Lab Project07 Final Review.pdf"
- RISC-V Instruction Set Manual (riscv.org)
- Digital logic simulator (hneemann/Digital)
- Pipeline hazards overview (Wikipedia)
Summary¶
-
The five stages (I, D, E, M, W) and stage suffixes (
_2,_3,_4) are the shared language of every timing diagram and every Hazard Unit equation in Project 07. -
Draw the timing diagram first. Place cycles across the top, instructions down the side, shift each row right by one, and read the last WB to count cycles.
-
Forwarding resolves ALU-to-ALU hazards by routing
ALUR_3(EX/MEM) orMR_4(MEM/WB) back into the EX MUXes; closest producer wins, so test_3before_4, and only forward from instructions that actually write (RFW). -
Load-use hazards need a stall plus a forward: freeze PC/IF-DR/DR-EX one cycle, clear EX/MEM to inject a bubble, then forward the loaded value. The
sd/ld/addexample takes 9 cycles. -
Back-to-back loads need only one stall — align the most recent load's write-back with the dependent read and add only the bubbles the alignment actually requires.
-
Control hazards need a flush: resolve
jal/branch in EX (PCBr MUX from the EX ALU result, selectorPCbr_2), then clear IF/DR and DR/EX so wrongly-fetched instructions become bubbles. -
Adding an instruction follows a fixed recipe — datapath, control line, component, decoder — whether it is sign-vs-zero extension (LWU/LBU) or a two-write-port operation (SWPR/ADDI2).
-
Get the submission structure right: top-level
project07.dig, no nested final directory, build the Hazard Unit incrementally (forwarding -> stalling -> flushing), and the cheapest points come from inverting the RegFile clock.