← Back to Course
# Lab: Processor JAL and JALR ## CS 315 Computer Architecture --- ## Goals for This Lab - Understand what `jal` and `jalr` do to registers and the PC - Recognize `call`, `j`, and `ret` as pseudo-instructions built on them - Add `PCsel`, `WDsel`, and `ALUSrcA` MUXes to the datapath - Extend the decoder spreadsheet with new control outputs - Trace a `jal`/`ret` program cycle-by-cycle
Starting point: a working Part 1 processor that runs
addi
,
add
, and
li
--- ## Why We Need Jumps Every program so far is a **straight line** — PC increments by 4 every cycle.
flowchart LR A["PC=0 li a0,1"] --> B["PC=4 li a1,2"] B --> C["PC=8 add a2,a0,a1"] C --> D["PC=12 unimp"]
Real programs need **functions**, which require: 1. Transfer control to a non-sequential address (jump) 2. Remember where to come back (link = save `PC + 4`) --- ## The Link: Saving the Return Address A jump that forgets where it came from is a **one-way trip**. The trick: save `PC + 4` (the next instruction) into a register *before* jumping.
RISC-V convention: return address lives in
ra
(=
x1
)
flowchart TD M2["main: jal first_s (PC=8)"] -->|"ra = 12, PC = first_s"| F0["first_s: add a0,a0,a1"] F0 --> F1["ret"] F1 -->|"PC = ra = 12"| M3["unimp (PC=12)"]
--- ## JAL: Jump And Link `jal` is a **J-type** instruction. It does two things in one cycle: ```text jal rd, imm Step 1 (link): rd <- PC + 4 # save return address Step 2 (jump): PC <- PC + imm # PC-relative jump ``` - `rd` gets `PC + 4` (the link / return address) - PC jumps by a **signed PC-relative offset** (`imm`)
The target is relative to the current PC, not an absolute address.
--- ## JAL Pseudo-Instructions You rarely write raw `jal` — the assembler provides friendlier names: | You write | Assembler emits | Meaning | |-----------|-----------------|---------| | `call first_s` | `jal ra, first_s` | Save return addr in `ra`, jump | | `jal first_s` | `jal ra, first_s` | Same (`rd` defaults to `ra`) | | `j label` | `jal zero, label` | Plain jump; discard link into `x0` |
j
is just
jal
with
rd = x0
. Writing to
x0
discards the value — no extra hardware needed.
--- ## J-Type Encoding The 20-bit immediate is **scrambled** across the instruction word: ```text J-type: | imm[20] | imm[10:1] | imm[11] | imm[19:12] | rd | opcode | bits: | 31 | 30:21 | 20 | 19:12 | 11:7 | 6:0 | opcode for jal = 0b1101111 ``` Your `ImmDecoder` reassembles these bits combinationally at runtime — the processor never does it by hand. The offset is always **even** (low bit = 0), giving a 21-bit signed range. --- ## JALR: Jump And Link Register `jalr` is the **register-based** cousin of `jal`. It is an **I-type** instruction: ```text jalr rd, rs1, imm Step 1 (link): rd <- PC + 4 # same link behavior Step 2 (jump): PC <- rs1 + imm # jump to a REGISTER value ``` | | `jal` | `jalr` | |-|-------|--------| | Format | J-type | I-type | | PC target | `PC + imm` | `rs1 + imm` | | Target known at | assemble time | **run time** | Because the target is in a register, `jalr` can jump to an address computed at runtime — exactly what a return needs. --- ## The `ret` Pseudo-Instruction `ret` is the most common use of `jalr`: ```text ret ==> jalr x0, ra, 0 ``` Two special values make it a return: - **`rd = x0`**: discard `PC + 4` (we don't need a new link on return) - **`imm = 0`**: jump exactly to `ra`, no offset Net effect: `PC = ra + 0 = ra`
ra
was set by the
jal
/
call
that invoked the function.
--- ## JAL vs JALR Side by Side | Property | `jal` | `jalr` | |----------|-------|--------| | Format | J-type | I-type | | opcode | `1101111` | `1100111` | | Link (`rd =`) | `PC + 4` | `PC + 4` | | PC update | `PC + imm` | `rs1 + imm` | | Immediate | `imm-J` (20-bit) | `imm-I` (12-bit) | | Uses `rs1`? | No | Yes | | Common alias | `call`, `j` | `ret` |
Both always link. They differ only in how the jump target is formed.
--- ## Call and Return Flow
flowchart LR subgraph CALL["jal ra, first_s"] direction TB A1["ra ← PC+4"] --> A2["PC ← PC+imm"] end subgraph RET["ret = jalr x0, ra, 0"] direction TB B1["x0 ← PC+4 (discarded)"] --> B2["PC ← ra + 0"] end CALL --> RET
The shared behavior is the **link** — both write `PC + 4` into `rd`. --- ## The Part 2 Program **Part 1** (already done): straight-line arithmetic **Part 2**: a real function call and return ```asm main: li a0, 1 # addi a0, zero, 1 li a1, 2 # addi a1, zero, 2 jal first_s # ra = PC+4, jump to first_s unimp # control returns HERE first_s: add a0, a0, a1 # a0 = 1 + 2 = 3 ret # jalr x0, ra, 0 -> PC = ra ``` --- ## Building the .hex File Assemble, inspect, and generate the ROM image: ```bash # assemble riscv64-unknown-elf-as -o lab10-part2.o lab10-part2.s # disassemble to verify encodings and offsets riscv64-unknown-elf-objdump -d lab10-part2.o # generate .hex for the Digital ROM python3 makerom3.py lab10-part2.o > lab10-part2.hex ``` Representative `objdump` output: ```text 0: 00100513 li a0,1 # addi a0,zero,1 4: 00200593 li a1,2 8: 008000ef jal ra,10 # offset = +8 → first_s at 0x10 c: 0000 unimp 10: 00b50533 add a0,a0,a1 14: 00008067 ret # jalr zero,0(ra) ``` --- ## New Datapath Components Four MUX changes needed to support `jal`/`jalr`: | MUX | Chooses between | Controlled by | Why | |-----|-----------------|---------------|-----| | `PCsel` | `PC+4` vs. jump target | `PCsel` | Override sequential PC | | `WDsel` | ALU result vs. `PC+4` | `WDsel` | Link writes `PC+4`, not ALU | | `ALUSrcA` | `RD0` vs. `PC` | `ALUSrcA` | `jal` target needs current PC | | `ALUSrcB` (wider) | `RD1` / `imm-I` / `imm-J` | `ALUSrcB` | Different instructions use different immediates | --- ## How the ALU Computes Every Target The **same ALU add** computes all jump targets — only the operands change: ```text jal: ALU.A = PC, ALU.B = imm-J → target = PC + imm-J jalr: ALU.A = RD0, ALU.B = imm-I → target = rs1 + imm add: ALU.A = RD0, ALU.B = RD1 → rd = rs1 + rs2 addi: ALU.A = RD0, ALU.B = imm-I → rd = rs1 + imm ```
MUXes before and after the ALU specialize behavior — the ALU itself always adds.
--- ## Datapath Sketch ```text PC ──►┌─────────┐ │ ALUSrcA ├─A─►┌─────┐ RD0 (rs1)──►│ MUX │ │ │ └─────────┘ │ ALU ├─R─►┬─► WDsel MUX ──► RegFile WD │ │ │ ▲ RD1 (rs2)──►┌─────────┐ ►B►│ │ │ │ imm-I ─────►│ ALUSrcB │ └─────┘ PC+4 ──────┘ (link value) imm-J ─────►│ MUX │ └─────────┘ PC+4 ───────►┌────────┐ │ PCsel ├──► next PC ALU target──►│ MUX │ └────────┘ ``` `PC + 4` fans out to: default `PCsel` input and `WDsel` link input. --- ## Extended Decoder Table | INUM | Instr | opcode | RFW | ALUOp | ALUSrcB | ALUSrcA | WDsel | PCsel | |------|-------|--------|-----|-------|---------|---------|-------|-------| | 0 | addi | 0010011 | 1 | 000 | 01 | 0 | 0 | 0 | | 1 | add | 0110011 | 1 | 000 | 00 | 0 | 0 | 0 | | 2 | **jal** | **1101111** | **1** | **000** | **10** | **1** | **1** | **1** | | 3 | **jalr** | **1100111** | **1** | **000** | **01** | **0** | **1** | **1** | Key: `ALUSrcA=1` for `jal` (uses PC); `ALUSrcA=0` for `jalr` (uses `rs1`). Old rows get `ALUSrcA=WDsel=PCsel=0` — preserving Part 1 behavior. --- ## Reading the New Decoder Rows **`jal` (INUM 2):** - `ALUSrcA=1` → PC into ALU A input (to compute `PC + imm-J`) - `ALUSrcB=10` → select `imm-J` - `WDsel=1` → write `PC+4` (the link) into `rd` - `PCsel=1` → next PC = ALU target **`jalr` (INUM 3):** - `ALUSrcA=0` → `RD0` (`rs1` = `ra` for `ret`) into ALU A - `ALUSrcB=01` → select `imm-I` (0 for `ret`) - `WDsel=1` → write link; `rd=x0` for `ret` so it's discarded - `PCsel=1` → next PC = ALU target --- ## Decoder Circuit Changes
flowchart LR IW["Instruction Word"] --> SP["splitters: opcode, funct3, funct7"] SP --> CMP["comparators: one per instruction"] CMP --> PE["priority encoder → INUM"] PE --> ROM["control ROM indexed by INUM"] ROM --> SPL["output splitter"] SPL --> CTL["RFW, ALUOp, ALUSrcB,\nALUSrcA, WDsel, PCsel"]
Two concrete changes from Part 1: 1. Add comparators for opcodes `1101111` and `1100111`; wire to INUM 2 and 3 2. Widen ROM word — now 9 bits: `RFW(1) + ALUOp(3) + ALUSrcB(2) + ALUSrcA(1) + WDsel(1) + PCsel(1)` --- ## Cycle-by-Cycle Trace Memory layout: `addi` @0, `addi` @4, `jal` @8, `unimp` @12, `add` @16, `ret` @20 | Cycle | PC | Instruction | a0 | a1 | ra | Next PC | |-------|----|-------------|----|----|----| --------| | 1 | 0 | `addi a0,zero,1` | 1 | — | — | 4 | | 2 | 4 | `addi a1,zero,2` | 1 | 2 | — | 8 | | 3 | 8 | `jal ra,16` | 1 | 2 | **12** | **16** | | 4 | 16 | `add a0,a0,a1` | **3** | 2 | 12 | 20 | | 5 | 20 | `jalr zero,ra,0` | 3 | 2 | 12 | **12** | | 6 | 12 | `unimp` | 3 | 2 | 12 | halt | Final: `a0 = 3` (the sum), `ra = 12` (return address correctly preserved). --- ## Cycle 3 Detail: jal At PC = 8, executing `jal ra, first_s`: - `ALUSrcA=1` → ALU A = PC = **8** - `ALUSrcB=10` → ALU B = `imm-J` = **8** - ALU computes: 8 + 8 = **16** (jump target) - `WDsel=1` → write data = `PC + 4` = **12** → written into `ra` - `PCsel=1` → next PC = **16**
After this cycle:
ra = 12
,
PC = 16
. The call has been made and the return address is saved.
--- ## Cycle 5 Detail: ret (jalr) At PC = 20, executing `jalr zero, ra, 0`: - `ALUSrcA=0` → ALU A = `RD0` = `ra` = **12** - `ALUSrcB=01` → ALU B = `imm-I` = **0** - ALU computes: 12 + 0 = **12** (return target) - `WDsel=1` → would write `PC + 4 = 24` into `rd`, but `rd = x0` → **discarded** - `PCsel=1` → next PC = **12** (the `unimp`)
The function has returned. Control is back at the instruction after the original
jal
.
--- ## Common Bug #1: Wrong ALUSrcA **Symptom**: `jal` jumps correctly but `ret` goes to the wrong address (or vice versa) **Cause**: `ALUSrcA` stuck at the same value for both `jal` and `jalr` | Instruction | Correct `ALUSrcA` | ALU A input | |-------------|-------------------|-------------| | `jal` | **1** | PC (to compute `PC + imm-J`) | | `jalr` | **0** | `RD0` / `rs1` (to compute `rs1 + imm`) | **Fix**: `ALUSrcA = 1` only for `jal`; `= 0` for `jalr` and all other instructions. --- ## Common Bug #2: Wrong WDsel (Link Bug) **Symptom**: `ret` returns to `first_s` instead of `unimp` — an infinite loop **Cause**: `WDsel = 0` for `jal`, so `rd` gets the ALU result (the jump target) instead of `PC + 4` ```text jal WDsel=0: ra ← target (first_s address) ← WRONG jal WDsel=1: ra ← PC + 4 = return address ← correct ``` **Fix**: `WDsel = 1` for both `jal` and `jalr`. **Diagnosis tip**: After the `jal` cycle, check `ra` on the dashboard — it must equal the address *after* the `jal`, not the function entry. --- ## Common Bug #3: PCsel Never Selects Target **Symptom**: `jal` "executes" but the PC just advances by 4; function body never entered **Cause**: `PCsel = 0` for the jump rows, or the MUX inputs are swapped **Fix**: `PCsel = 1` for both `jal` and `jalr`; verify MUX input 0 = `PC+4`, input 1 = target ## Common Bug #4: Wrong Immediate **Symptom**: `jal` lands at a plausible but wrong address **Cause**: `ALUSrcB` selects `imm-I` for `jal` (should be `imm-J`) or vice versa **Fix**: `jal → ALUSrcB=10` (imm-J); `jalr → ALUSrcB=01` (imm-I) --- ## Common Bug #5: Forgot to Backfill Old Rows **Symptom**: `addi`/`add` break after you add the new MUXes **Cause**: `ALUSrcA`, `WDsel`, `PCsel` are left as `x` or `1` for the old rows **Rule**: When you add a new control output, set it to its **inert value (0)** for every existing instruction row. ```text addi: ALUSrcA=0 WDsel=0 PCsel=0 ← preserves Part 1 behavior add: ALUSrcA=0 WDsel=0 PCsel=0 ``` MUXes are wired so input 0 = original Part 1 path. --- ## Incremental Development Workflow
flowchart TD A["1. Pick instructions: jal then jalr"] --> B["2. Add MUXes: PCsel, WDsel, ALUSrcA"] B --> C["3. Wire datapath: PC, PC+4, imm-J"] C --> D["4. Extend decoder: new rows + columns"] D --> E["5. Test in Digital: single-step + dashboard"] E -->|"bug"| C E -->|"passes"| F["commit lab10-part2.dig"]
--- ## Debugging in Digital Practical tips from the guides: - **Add dashboard probes** for: `PC`, `PC+4`, `iw`, `RS1`, `imm-I`, `imm-J`, ALU result, `PCsel` - **Single-step with `objdump` open** — match each cycle to expected register values - **Watch `ra`** — after the `jal` cycle it must equal the `unimp` address; if it equals `first_s`, you have a `WDsel` bug - **Use `EN`** — press play, select `PROG`, toggle `EN` to 1 so PC starts at 0 - **Paste the `.dig` test** into your processor to run the autograder test directly in Digital --- ## Key Concepts Recap | Concept | Definition | |---------|------------| | **Link** | `rd = PC + 4` saved before a jump | | **`jal`** | J-type; `rd=PC+4`, `PC=PC+imm` (PC-relative) | | **`jalr`** | I-type; `rd=PC+4`, `PC=rs1+imm` (register-relative) | | **`ret`** | `jalr x0, ra, 0` — discard link, jump to `ra` | | **`PCsel`** | MUX: `PC+4` vs. jump target → next PC | | **`WDsel`** | MUX: ALU result vs. `PC+4` → register write data | | **`ALUSrcA`** | MUX: `RD0` vs. `PC` → ALU A input (`1` for `jal` only) | --- ## Summary 1. **Functions need a link.** `jal` saves `PC+4` into `rd` before jumping — that saved address is the return address. 2. **`jal` is PC-relative; `jalr` is register-relative.** Same link behavior, different target source. 3. **`ret` = `jalr x0, ra, 0`.** Discard the link (`rd=x0`), jump exactly to `ra` (`imm=0`). 4. **Three new MUXes.** `PCsel` (next PC), `WDsel` (link vs. ALU), `ALUSrcA` (PC vs. register); plus widened `ALUSrcB`. 5. **Two decoder rows, three new columns.** Backfill `0` for `addi`/`add` to preserve Part 1. 6. **Most bugs are control-line mix-ups.** Wrong `ALUSrcA` → bad target; wrong `WDsel` → `ret` loops; wrong `PCsel` → no jump at all.