Lab: Processor JAL and JALR

# Lab: Processor JAL and JALR

## CS 315 Computer Architecture

---

## Goals for This Lab

- Understand what `jal` and `jalr` do to registers and the PC
- Recognize `call`, `j`, and `ret` as pseudo-instructions built on them
- Add `PCsel`, `WDsel`, and `ALUSrcA` MUXes to the datapath
- Extend the decoder spreadsheet with new control outputs
- Trace a `jal`/`ret` program cycle-by-cycle

<div class="info-box">
Starting point: a working Part 1 processor that runs <code>addi</code>, <code>add</code>, and <code>li</code>
</div>

---

## Why We Need Jumps

Every program so far is a **straight line** — PC increments by 4 every cycle.

<div class="mermaid">
flowchart LR
    A["PC=0 li a0,1"] --> B["PC=4 li a1,2"]
    B --> C["PC=8 add a2,a0,a1"]
    C --> D["PC=12 unimp"]
</div>

Real programs need **functions**, which require:
1. Transfer control to a non-sequential address (jump)
2. Remember where to come back (link = save `PC + 4`)

---

## The Link: Saving the Return Address

A jump that forgets where it came from is a **one-way trip**.

The trick: save `PC + 4` (the next instruction) into a register *before* jumping.

<div class="highlight-box">
RISC-V convention: return address lives in <code>ra</code> (= <code>x1</code>)
</div>

<div class="mermaid">
flowchart TD
    M2["main: jal first_s  (PC=8)"] -->|"ra = 12, PC = first_s"| F0["first_s: add a0,a0,a1"]
    F0 --> F1["ret"]
    F1 -->|"PC = ra = 12"| M3["unimp  (PC=12)"]
</div>

---

## JAL: Jump And Link

`jal` is a **J-type** instruction. It does two things in one cycle:

```text
jal rd, imm

Step 1 (link):  rd  <- PC + 4       # save return address
Step 2 (jump):  PC  <- PC + imm     # PC-relative jump
```

- `rd` gets `PC + 4` (the link / return address)
- PC jumps by a **signed PC-relative offset** (`imm`)

<div class="info-box">
The target is relative to the current PC, not an absolute address.
</div>

---

## JAL Pseudo-Instructions

You rarely write raw `jal` — the assembler provides friendlier names:

| You write | Assembler emits | Meaning |
|-----------|-----------------|---------|
| `call first_s` | `jal ra, first_s` | Save return addr in `ra`, jump |
| `jal first_s` | `jal ra, first_s` | Same (`rd` defaults to `ra`) |
| `j label` | `jal zero, label` | Plain jump; discard link into `x0` |

<div class="highlight-box">
<code>j</code> is just <code>jal</code> with <code>rd = x0</code>. Writing to <code>x0</code> discards the value — no extra hardware needed.
</div>

---

## J-Type Encoding

The 20-bit immediate is **scrambled** across the instruction word:

```text
J-type:  | imm[20] | imm[10:1] | imm[11] | imm[19:12] |  rd  | opcode |
bits:    |   31    |  30:21    |   20    |   19:12    | 11:7 |  6:0   |

opcode for jal = 0b1101111
```

Your `ImmDecoder` reassembles these bits combinationally at runtime — the processor never does it by hand.

The offset is always **even** (low bit = 0), giving a 21-bit signed range.

---

## JALR: Jump And Link Register

`jalr` is the **register-based** cousin of `jal`. It is an **I-type** instruction:

```text
jalr rd, rs1, imm

Step 1 (link):  rd  <- PC + 4       # same link behavior
Step 2 (jump):  PC  <- rs1 + imm    # jump to a REGISTER value
```

| | `jal` | `jalr` |
|-|-------|--------|
| Format | J-type | I-type |
| PC target | `PC + imm` | `rs1 + imm` |
| Target known at | assemble time | **run time** |

Because the target is in a register, `jalr` can jump to an address computed at runtime — exactly what a return needs.

---

## The `ret` Pseudo-Instruction

`ret` is the most common use of `jalr`:

```text
ret   ==>   jalr x0, ra, 0
```

Two special values make it a return:

- **`rd = x0`**: discard `PC + 4` (we don't need a new link on return)
- **`imm = 0`**: jump exactly to `ra`, no offset

Net effect: `PC = ra + 0 = ra`

<div class="info-box">
<code>ra</code> was set by the <code>jal</code>/<code>call</code> that invoked the function.
</div>

---

## JAL vs JALR Side by Side

| Property | `jal` | `jalr` |
|----------|-------|--------|
| Format | J-type | I-type |
| opcode | `1101111` | `1100111` |
| Link (`rd =`) | `PC + 4` | `PC + 4` |
| PC update | `PC + imm` | `rs1 + imm` |
| Immediate | `imm-J` (20-bit) | `imm-I` (12-bit) |
| Uses `rs1`? | No | Yes |
| Common alias | `call`, `j` | `ret` |

<div class="highlight-box">
Both always link. They differ only in how the jump target is formed.
</div>

---

## Call and Return Flow

<div class="mermaid">
flowchart LR
    subgraph CALL["jal ra, first_s"]
        direction TB
        A1["ra ← PC+4"] --> A2["PC ← PC+imm"]
    end
    subgraph RET["ret = jalr x0, ra, 0"]
        direction TB
        B1["x0 ← PC+4 (discarded)"] --> B2["PC ← ra + 0"]
    end
    CALL --> RET
</div>

The shared behavior is the **link** — both write `PC + 4` into `rd`.

---

## The Part 2 Program

**Part 1** (already done): straight-line arithmetic

**Part 2**: a real function call and return

```asm
main:
    li a0, 1           # addi a0, zero, 1
    li a1, 2           # addi a1, zero, 2
    jal first_s        # ra = PC+4, jump to first_s
    unimp              # control returns HERE

first_s:
    add a0, a0, a1     # a0 = 1 + 2 = 3
    ret                # jalr x0, ra, 0  -> PC = ra
```

---

## Building the .hex File

Assemble, inspect, and generate the ROM image:

```bash
# assemble
riscv64-unknown-elf-as -o lab10-part2.o lab10-part2.s

# disassemble to verify encodings and offsets
riscv64-unknown-elf-objdump -d lab10-part2.o

# generate .hex for the Digital ROM
python3 makerom3.py lab10-part2.o > lab10-part2.hex
```

Representative `objdump` output:

```text
0:  00100513    li   a0,1       # addi a0,zero,1
4:  00200593    li   a1,2
8:  008000ef    jal  ra,10      # offset = +8 → first_s at 0x10
c:  0000        unimp
10: 00b50533    add  a0,a0,a1
14: 00008067    ret             # jalr zero,0(ra)
```

---

## New Datapath Components

Four MUX changes needed to support `jal`/`jalr`:

| MUX | Chooses between | Controlled by | Why |
|-----|-----------------|---------------|-----|
| `PCsel` | `PC+4` vs. jump target | `PCsel` | Override sequential PC |
| `WDsel` | ALU result vs. `PC+4` | `WDsel` | Link writes `PC+4`, not ALU |
| `ALUSrcA` | `RD0` vs. `PC` | `ALUSrcA` | `jal` target needs current PC |
| `ALUSrcB` (wider) | `RD1` / `imm-I` / `imm-J` | `ALUSrcB` | Different instructions use different immediates |

---

## How the ALU Computes Every Target

The **same ALU add** computes all jump targets — only the operands change:

```text
jal:   ALU.A = PC,   ALU.B = imm-J   →  target = PC + imm-J
jalr:  ALU.A = RD0,  ALU.B = imm-I   →  target = rs1 + imm
add:   ALU.A = RD0,  ALU.B = RD1     →  rd = rs1 + rs2
addi:  ALU.A = RD0,  ALU.B = imm-I   →  rd = rs1 + imm
```

<div class="info-box">
MUXes before and after the ALU specialize behavior — the ALU itself always adds.
</div>

---

## Datapath Sketch

```text
         PC ──►┌─────────┐
               │ ALUSrcA ├─A─►┌─────┐
  RD0 (rs1)──►│   MUX   │    │     │
               └─────────┘    │ ALU ├─R─►┬─► WDsel MUX ──► RegFile WD
                               │     │   │        ▲
  RD1 (rs2)──►┌─────────┐ ►B►│     │   │        │
  imm-I ─────►│ ALUSrcB │    └─────┘  PC+4 ──────┘  (link value)
  imm-J ─────►│   MUX   │
               └─────────┘

PC+4 ───────►┌────────┐
               │ PCsel  ├──► next PC
  ALU target──►│  MUX   │
               └────────┘
```

`PC + 4` fans out to: default `PCsel` input and `WDsel` link input.

---

## Extended Decoder Table

| INUM | Instr | opcode | RFW | ALUOp | ALUSrcB | ALUSrcA | WDsel | PCsel |
|------|-------|--------|-----|-------|---------|---------|-------|-------|
| 0 | addi | 0010011 | 1 | 000 | 01 | 0 | 0 | 0 |
| 1 | add | 0110011 | 1 | 000 | 00 | 0 | 0 | 0 |
| 2 | **jal** | **1101111** | **1** | **000** | **10** | **1** | **1** | **1** |
| 3 | **jalr** | **1100111** | **1** | **000** | **01** | **0** | **1** | **1** |

Key: `ALUSrcA=1` for `jal` (uses PC); `ALUSrcA=0` for `jalr` (uses `rs1`).

Old rows get `ALUSrcA=WDsel=PCsel=0` — preserving Part 1 behavior.

---

## Reading the New Decoder Rows

**`jal` (INUM 2):**
- `ALUSrcA=1` → PC into ALU A input (to compute `PC + imm-J`)
- `ALUSrcB=10` → select `imm-J`
- `WDsel=1` → write `PC+4` (the link) into `rd`
- `PCsel=1` → next PC = ALU target

**`jalr` (INUM 3):**
- `ALUSrcA=0` → `RD0` (`rs1` = `ra` for `ret`) into ALU A
- `ALUSrcB=01` → select `imm-I` (0 for `ret`)
- `WDsel=1` → write link; `rd=x0` for `ret` so it's discarded
- `PCsel=1` → next PC = ALU target

---

## Decoder Circuit Changes

<div class="mermaid">
flowchart LR
    IW["Instruction Word"] --> SP["splitters: opcode, funct3, funct7"]
    SP --> CMP["comparators: one per instruction"]
    CMP --> PE["priority encoder → INUM"]
    PE --> ROM["control ROM indexed by INUM"]
    ROM --> SPL["output splitter"]
    SPL --> CTL["RFW, ALUOp, ALUSrcB,\nALUSrcA, WDsel, PCsel"]
</div>

Two concrete changes from Part 1:
1. Add comparators for opcodes `1101111` and `1100111`; wire to INUM 2 and 3
2. Widen ROM word — now 9 bits: `RFW(1) + ALUOp(3) + ALUSrcB(2) + ALUSrcA(1) + WDsel(1) + PCsel(1)`

---

## Cycle-by-Cycle Trace

Memory layout: `addi` @0, `addi` @4, `jal` @8, `unimp` @12, `add` @16, `ret` @20

| Cycle | PC | Instruction | a0 | a1 | ra | Next PC |
|-------|----|-------------|----|----|----| --------|
| 1 | 0 | `addi a0,zero,1` | 1 | — | — | 4 |
| 2 | 4 | `addi a1,zero,2` | 1 | 2 | — | 8 |
| 3 | 8 | `jal ra,16` | 1 | 2 | **12** | **16** |
| 4 | 16 | `add a0,a0,a1` | **3** | 2 | 12 | 20 |
| 5 | 20 | `jalr zero,ra,0` | 3 | 2 | 12 | **12** |
| 6 | 12 | `unimp` | 3 | 2 | 12 | halt |

Final: `a0 = 3` (the sum), `ra = 12` (return address correctly preserved).

---

## Cycle 3 Detail: jal

At PC = 8, executing `jal ra, first_s`:

- `ALUSrcA=1` → ALU A = PC = **8**
- `ALUSrcB=10` → ALU B = `imm-J` = **8**
- ALU computes: 8 + 8 = **16** (jump target)
- `WDsel=1` → write data = `PC + 4` = **12** → written into `ra`
- `PCsel=1` → next PC = **16**

<div class="highlight-box">
After this cycle: <code>ra = 12</code>, <code>PC = 16</code>. The call has been made and the return address is saved.
</div>

---

## Cycle 5 Detail: ret (jalr)

At PC = 20, executing `jalr zero, ra, 0`:

- `ALUSrcA=0` → ALU A = `RD0` = `ra` = **12**
- `ALUSrcB=01` → ALU B = `imm-I` = **0**
- ALU computes: 12 + 0 = **12** (return target)
- `WDsel=1` → would write `PC + 4 = 24` into `rd`, but `rd = x0` → **discarded**
- `PCsel=1` → next PC = **12** (the `unimp`)

<div class="highlight-box">
The function has returned. Control is back at the instruction after the original <code>jal</code>.
</div>

---

## Common Bug #1: Wrong ALUSrcA

**Symptom**: `jal` jumps correctly but `ret` goes to the wrong address (or vice versa)

**Cause**: `ALUSrcA` stuck at the same value for both `jal` and `jalr`

| Instruction | Correct `ALUSrcA` | ALU A input |
|-------------|-------------------|-------------|
| `jal` | **1** | PC (to compute `PC + imm-J`) |
| `jalr` | **0** | `RD0` / `rs1` (to compute `rs1 + imm`) |

**Fix**: `ALUSrcA = 1` only for `jal`; `= 0` for `jalr` and all other instructions.

---

## Common Bug #2: Wrong WDsel (Link Bug)

**Symptom**: `ret` returns to `first_s` instead of `unimp` — an infinite loop

**Cause**: `WDsel = 0` for `jal`, so `rd` gets the ALU result (the jump target) instead of `PC + 4`

```text
jal WDsel=0:  ra ← target (first_s address)  ← WRONG
jal WDsel=1:  ra ← PC + 4 = return address   ← correct
```

**Fix**: `WDsel = 1` for both `jal` and `jalr`.

**Diagnosis tip**: After the `jal` cycle, check `ra` on the dashboard — it must equal the address *after* the `jal`, not the function entry.

---

## Common Bug #3: PCsel Never Selects Target

**Symptom**: `jal` "executes" but the PC just advances by 4; function body never entered

**Cause**: `PCsel = 0` for the jump rows, or the MUX inputs are swapped

**Fix**: `PCsel = 1` for both `jal` and `jalr`; verify MUX input 0 = `PC+4`, input 1 = target

## Common Bug #4: Wrong Immediate

**Symptom**: `jal` lands at a plausible but wrong address

**Cause**: `ALUSrcB` selects `imm-I` for `jal` (should be `imm-J`) or vice versa

**Fix**: `jal → ALUSrcB=10` (imm-J); `jalr → ALUSrcB=01` (imm-I)

---

## Common Bug #5: Forgot to Backfill Old Rows

**Symptom**: `addi`/`add` break after you add the new MUXes

**Cause**: `ALUSrcA`, `WDsel`, `PCsel` are left as `x` or `1` for the old rows

**Rule**: When you add a new control output, set it to its **inert value (0)** for every existing instruction row.

```text
addi: ALUSrcA=0  WDsel=0  PCsel=0   ← preserves Part 1 behavior
add:  ALUSrcA=0  WDsel=0  PCsel=0
```

MUXes are wired so input 0 = original Part 1 path.

---

## Incremental Development Workflow

<div class="mermaid">
flowchart TD
    A["1. Pick instructions: jal then jalr"] --> B["2. Add MUXes: PCsel, WDsel, ALUSrcA"]
    B --> C["3. Wire datapath: PC, PC+4, imm-J"]
    C --> D["4. Extend decoder: new rows + columns"]
    D --> E["5. Test in Digital: single-step + dashboard"]
    E -->|"bug"| C
    E -->|"passes"| F["commit lab10-part2.dig"]
</div>

---

## Debugging in Digital

Practical tips from the guides:

- **Add dashboard probes** for: `PC`, `PC+4`, `iw`, `RS1`, `imm-I`, `imm-J`, ALU result, `PCsel`
- **Single-step with `objdump` open** — match each cycle to expected register values
- **Watch `ra`** — after the `jal` cycle it must equal the `unimp` address; if it equals `first_s`, you have a `WDsel` bug
- **Use `EN`** — press play, select `PROG`, toggle `EN` to 1 so PC starts at 0
- **Paste the `.dig` test** into your processor to run the autograder test directly in Digital

---

## Key Concepts Recap

| Concept | Definition |
|---------|------------|
| **Link** | `rd = PC + 4` saved before a jump |
| **`jal`** | J-type; `rd=PC+4`, `PC=PC+imm` (PC-relative) |
| **`jalr`** | I-type; `rd=PC+4`, `PC=rs1+imm` (register-relative) |
| **`ret`** | `jalr x0, ra, 0` — discard link, jump to `ra` |
| **`PCsel`** | MUX: `PC+4` vs. jump target → next PC |
| **`WDsel`** | MUX: ALU result vs. `PC+4` → register write data |
| **`ALUSrcA`** | MUX: `RD0` vs. `PC` → ALU A input (`1` for `jal` only) |

---

## Summary

1. **Functions need a link.** `jal` saves `PC+4` into `rd` before jumping — that saved address is the return address.

2. **`jal` is PC-relative; `jalr` is register-relative.** Same link behavior, different target source.

3. **`ret` = `jalr x0, ra, 0`.** Discard the link (`rd=x0`), jump exactly to `ra` (`imm=0`).

4. **Three new MUXes.** `PCsel` (next PC), `WDsel` (link vs. ALU), `ALUSrcA` (PC vs. register); plus widened `ALUSrcB`.

5. **Two decoder rows, three new columns.** Backfill `0` for `addi`/`add` to preserve Part 1.

6. **Most bugs are control-line mix-ups.** Wrong `ALUSrcA` → bad target; wrong `WDsel` → `ret` loops; wrong `PCsel` → no jump at all.