Lab: Processor JAL and JALR¶

Overview¶

This is a hands-on lab session that takes our single-cycle RISC-V processor from "can execute a straight line of arithmetic" to "can call a function and return from it." Up to this point the processor only ran I-type and R-type instructions, so the program counter marched forward by PC + 4 every cycle and never did anything else. Function calls require control flow: the ability to jump somewhere else and, crucially, to remember where to come back to. We implement the two RISC-V instructions that make this possible: jal (jump and link, the call primitive) and jalr (jump and link register, the ret primitive). The work is partly conceptual (what these instructions mean at the register/PC level) and partly Digital schematic engineering (the new MUXes and datapath wires, plus the extra control lines coming out of the instruction decoder). By the end you will be able to run a program that does jal first_s ... ret inside your own processor circuit, which is exactly what Lab 10 Part 2 asks for.

Learning Objectives¶

Describe precisely what jal and jalr do to the destination register and the PC
Explain why a return address (PC + 4) must be saved in a link register before a jump
Decode the J-type and I-type instruction formats well enough to compute jump targets by hand
Recognize call, j, and ret as pseudo-instructions built on jal/jalr
Add the PCsel, WDsel, and ALUSrcA MUXes and the widened ALUSrcB MUX to the datapath
Extend the instruction-decoder spreadsheet with new control outputs and keep existing rows correct
Trace a jal/ret program cycle-by-cycle through the processor and predict every register and PC value
Diagnose the most common JAL/JALR wiring and decoder bugs

Prerequisites¶

A working Lab 09 / Part 1 processor that executes addi, add, and li (the first_s program)
The Digital schematic editor installed and able to open and simulate your .dig circuits
RISC-V instruction formats from Lab 03 (R-type, I-type, J-type bit layouts)
The ABI calling conventions: ra is the return-address register, a0-a7 hold arguments and results
Comfort with MUXes, ROMs, priority encoders, and comparators from Lab 09
The decoder-spreadsheet methodology from Processor Guide Part 2

1. Why We Need Jumps at All¶

Every program our processor has run so far is a straight line. The PC starts at the first instruction and increments by 4 each cycle:

flowchart LR
    A["PC = 0<br/>li a0, 1"] --> B["PC = 4<br/>li a1, 2"]
    B --> C["PC = 8<br/>add a2, a0, a1"]
    C --> D["PC = 12<br/>unimp (halt)"]

This is fine for arithmetic, but real programs are organized into functions, and calling a function means two things the straight-line model cannot do:

Transfer control to an instruction that is not PC + 4 (jump to the function body).
Remember where we came from so that when the function finishes, control can return to the instruction after the call.

The second requirement is the subtle one. A jump that forgets where it came from is a one-way trip. The trick that makes functions work is the link: at the moment of the jump, we save the address of the next sequential instruction (PC + 4) into a register. That saved address is the return address. RISC-V's convention is to keep it in register ra (which is x1).

flowchart TD
    M0["main: PC=0  li a0,1"] --> M1["PC=4  li a1,2"]
    M1 --> M2["PC=8  jal first_s"]
    M2 -->|"ra = PC+4 = 12<br/>PC = address of first_s"| F0["first_s: add a0,a0,a1"]
    F0 --> F1["ret"]
    F1 -->|"PC = ra = 12"| M3["PC=12  unimp (halt)"]

The two boxed transitions are exactly what jal and ret (which is jalr) implement. jal does the call-and-remember, ret does the return.

2. JAL: Jump And Link (the `call` primitive)¶

The handwritten lab notes summarize jal in three lines, which we will expand. jal is a J-type instruction. It does two things in a single cycle:

jal rd, imm        # rd is the link/destination register, imm is a PC-relative offset

Step 1 (link):  rd  <- PC + 4        # save return address into rd
Step 2 (jump):  PC  <- PC + imm      # jump PC-relative by the J-type immediate

Two important points:

The destination register rd receives PC + 4, the address of the instruction after the jal. This is the "link."
The PC is updated to PC + imm. The immediate is PC-relative: the jump target is computed by adding a signed offset to the current PC, not by loading an absolute address.

The `call` and `j` pseudo-instructions¶

You rarely write raw jal. The assembler gives you friendlier names that expand to jal:

You write	Assembler emits	Meaning
`call first_s`	`jal ra, first_s`	Save return addr in `ra`, jump to `first_s`
`jal first_s`	`jal ra, first_s`	Same as `call` (rd defaults to `ra`)
`j label`	`jal zero, label`	Plain jump; throw the link away into `x0`

The j (plain jump) case is the key insight into why a single instruction covers both calls and gotos: if you do not need the return address, you link into x0 (the zero register), which discards the write. So j is just jal that does not care about the link.

J-type encoding¶

jal uses the J-type format. The 20-bit immediate is scrambled across the instruction word and must be reassembled, then sign-extended and treated as a multiple of 2 (the low bit is always 0):

J-type:  | imm[20] | imm[10:1] | imm[11] | imm[19:12] |   rd   | opcode |
bits:    |   31    |  30:21    |   20    |   19:12    | 11:7   |  6:0   |

opcode for jal = 0b1101111
imm = sign_extend( imm[20]<<20 | imm[19:12]<<12 | imm[11]<<11 | imm[10:1]<<1 )

In hardware your ImmDecoder already produces this imm-J value from the instruction word. The processor never reassembles bits by hand at runtime; the decoder does it combinationally.

3. JALR: Jump And Link Register (the `ret` primitive)¶

jalr is the register-based cousin of jal. The handwritten notes describe it as "jump to register value and link." It is an I-type instruction:

jalr rd, rs1, imm      # rd is link reg, rs1 holds a base address, imm is a small offset

Step 1 (link):  rd  <- PC + 4        # same link behavior as jal
Step 2 (jump):  PC  <- rs1 + imm     # jump to a REGISTER value plus immediate

The difference from jal is where the target comes from:

Instruction	Target computation	Addressing style
`jal`	`PC = PC + imm`	PC-relative (offset is fixed)
`jalr`	`PC = rs1 + imm`	Register-relative (computed)

Because the target is in a register, jalr can jump to an address that was computed at runtime, which is exactly what a return needs: the return address was stored in ra by an earlier jal.

The `ret` pseudo-instruction¶

ret is the most common use of jalr. The handwritten notes call out the two special values that make it a return:

ret      expands to     jalr x0, ra, 0

rd is x0 (the notes' "always 0 for ret"). We do not want to overwrite the link this time, so we discard PC + 4 into the zero register.
imm is 0 (the notes' "imm is 0 for ret"). We jump exactly to the address in ra, with no offset.

So ret reduces to PC <- ra + 0 = ra, and that is how a function returns to its caller. The ra it reads was placed there by the call/jal that invoked the function.

flowchart LR
    subgraph CALL["jal ra, first_s"]
        direction TB
        A1["ra <- PC+4"] --> A2["PC <- PC+imm"]
    end
    subgraph RET["ret = jalr x0, ra, 0"]
        direction TB
        B1["x0 <- PC+4 (discarded)"] --> B2["PC <- ra + 0"]
    end
    CALL --> RET

4. JAL vs JALR Side by Side¶

It is worth memorizing this comparison; almost every JAL/JALR bug comes from mixing up which field drives the PC.

Property	`jal`	`jalr`
Format	J-type	I-type
opcode	`0b1101111`	`0b1100111`
Link (write to `rd`)	`rd = PC + 4`	`rd = PC + 4`
PC update	`PC = PC + imm`	`PC = rs1 + imm`
Immediate source	`imm-J` (20-bit)	`imm-I` (12-bit)
Uses a source reg?	No	Yes (`rs1`)
Common alias	`call`, `j`	`ret`
Target known at...	assemble time	run time

The shared behavior is the link: both write PC + 4 into rd. The difference is purely in how the PC target is formed: a fixed PC-relative offset for jal, a runtime register value plus offset for jalr.

5. The Lab 10 Programs¶

The lab is incremental. Part 1 (already done) runs the straight-line program; Part 2 adds the function call.

Part 1 — straight line (no jumps):

first_s:
    li a0, 1
    li a1, 2
    add a2, a0, a1
    unimp

Part 2 — a real function call and return:

main:
    li a0, 1
    li a1, 2
    jal first_s        # ra = PC+4, jump into first_s
    unimp              # control returns HERE after ret
first_s:
    add a0, a0, a1     # a0 = 1 + 2 = 3
    ret                # jalr x0, ra, 0  -> PC = ra

A few things to notice that bite people when preparing programs for the processor (these come straight from the project guidance):

Use jal, not call. Both assemble to the same machine code, but using jal explicitly matches the instruction your decoder recognizes.
There must be a main. Execution starts at the top of instruction memory.
unimp halts. It is your end-of-program marker, and after ret we land back on the unimp that follows the jal.
li is a pseudo-instruction. li a0, 1 assembles to addi a0, zero, 1, which your Part 1 decoder already handles.

Building the `.hex` and checking the encoding¶

Assemble the program and disassemble it so you can see the exact instruction words your ROM will hold and confirm the jump offsets:

# assemble the .s to an object file
riscv64-unknown-elf-as -o lab10-part2.o lab10-part2.s

# disassemble to see opcodes, fields, and the jal/ret expansion
riscv64-unknown-elf-objdump -d lab10-part2.o

# generate the .hex you load into the Digital ROM (course tool)
python3 makerom3.py lab10-part2.o > lab10-part2.hex

A representative objdump excerpt (addresses and exact encodings depend on your assembler) shows how the pseudo-instructions expand:

0:   00100513    li   a0,1        # addi a0,zero,1
4:   00200593    li   a1,2        # addi a1,zero,2
8:   008000ef    jal  ra,10       # jal ra, first_s  (offset +8)
c:   0000          unimp
10:  00b50533    add  a0,a0,a1    # first_s
14:  00008067    ret              # jalr zero,0(ra)

Notice that ret disassembles as jalr zero,0(ra) — rd = zero, rs1 = ra, imm = 0, exactly the two special values from Section 3.

6. New Datapath Components¶

Recall the Part 1 datapath: instruction memory feeds the decoders, the register file feeds the ALU, the ALU result writes back, and the PC always advances by PC + 4. To support jal/jalr we add three new MUXes and widen one existing MUX. The Part 2 guide lists exactly these additions.

New/changed MUX	Chooses between	Controlled by	Why we need it
`PCsel`	`PC + 4` vs. branch/jump target	`PCsel`	A jump must override the default `PC + 4`
`WDsel`	ALU result vs. `PC + 4`	`WDsel`	The link writes `PC + 4` (not the ALU) into `rd`
`ALUSrcA`	register `RD0` vs. `PC`	`ALUSrcA`	`jal` target is `PC + imm`, so the ALU's A input is PC
`ALUSrcB` (wider)	`RD1` vs. `imm-I` vs. `imm-J`	`ALUSrcB`	Different jumps use different immediates

The intuition for each:

PCsel: Until now the next PC was hardwired to PC + 4. A jump produces a target address that must win instead. PCsel selects between the sequential PC + 4 and the computed target. For jal/jalr it always selects the target; for ordinary instructions it selects PC + 4.
WDsel (write-data select): For add/addi the register file's write data comes from the ALU. For the link step of jal/jalr, the write data is PC + 4. WDsel picks which one reaches the register file's WD input.
ALUSrcA: The ALU normally takes RD0 (the value of rs1) on its A input. To compute jal's target PC + imm, we instead feed PC into A. ALUSrcA chooses register-vs-PC for the ALU's first operand.
Widened ALUSrcB: In Part 1 this was a single bit (register vs. immediate). Now it needs to choose among RD1, the I-type immediate, and the J-type immediate, so it grows to a 2-bit selector.

How each instruction computes its target in the ALU¶

The clean realization is that the same ALU computes every jump target; only the operands change:

jal:   ALU.A = PC,    ALU.B = imm-J,   ALUOp = add   ->  target = PC + imm-J
jalr:  ALU.A = RD0,   ALU.B = imm-I,   ALUOp = add   ->  target = rs1 + imm
add:   ALU.A = RD0,   ALU.B = RD1,     ALUOp = add   ->  rd = rs1 + rs2
addi:  ALU.A = RD0,   ALU.B = imm-I,   ALUOp = add   ->  rd = rs1 + imm

So the ALU does one add in all four cases. The MUXes in front of it (ALUSrcA, ALUSrcB) and behind it (PCsel, WDsel) are what specialize the behavior.

Datapath sketch¶

                +--------+
        PC ---->|        |
                | ALUSrcA|---A--->+-----+
   RD0 (rs1)--->|  MUX   |        |     |
                +--------+        | ALU |---R--+--> WDsel MUX --> RegFile WD
                                  |     |      |        ^
   RD1 (rs2)--->+--------+  ----B>|     |      |        |
   imm-I ------>| ALUSrcB|        +-----+   PC+4 -------+  (link value)
   imm-J ------>|  MUX   |
                +--------+

      PC+4 ---->+--------+
                | PCsel  |---> next PC  (into PC register)
   ALU target ->|  MUX   |
                +--------+

The PC + 4 value is itself produced by a small dedicated adder (the same one that drove the PC in Part 1); now it fans out to three places: the default PCsel input, the WDsel link input, and nowhere else changes.

7. Extending the Instruction Decoder¶

The InstDecoder is the spreadsheet-driven control unit from Part 2. Each supported instruction gets a row; the row's Inputs (opcode, funct3, funct7) identify the instruction, and the row's Control outputs drive the datapath. To add jal and jalr we (1) add two rows and (2) add columns for the new control lines, then (3) backfill the new columns for the existing rows.

The cardinal rule from the guide: when you add a new control output, set it to its inert value (usually 0) for every existing instruction. MUXes are wired so input 0 is the original Part 1 behavior, so writing 0 in the new columns leaves addi/add working exactly as before.

A decoder table for the four instructions in the Part 2 program looks like this:

INUM	Instr	Fmt	opcode	funct3	funct7	RFW	ALUSrcB	ALUSrcA	WDsel	PCsel
0	addi	I	0010011	000	xxxxxxx	1	01	0	0	0
1	add	R	0110011	000	0000000	1	00	0	0	0
2	jal	J	1101111	xxx	xxxxxxx	1	10	1	1	1
3	jalr	I	1100111	000	xxxxxxx	1	01	0	1	1

Reading the new rows:

jal (INUM 2): RFW=1 so we write the link into rd. ALUSrcA=1 selects PC into the ALU's A input. ALUSrcB=10 selects imm-J. WDsel=1 so the write data is PC + 4 (the link), not the ALU result. PCsel=1 so the PC takes the jump target.
jalr (INUM 3): ALUSrcA=0 keeps RD0 (= rs1 = ra for a ret) on the ALU's A input. ALUSrcB=01 selects imm-I (which is 0 for ret). WDsel=1 writes the link; PCsel=1 takes the jump. For ret the destination is x0, so the write is harmlessly discarded even though RFW=1.
addi, add (INUM 0, 1): the three new columns (ALUSrcA, WDsel, PCsel) are all 0, preserving Part 1 behavior. Note ALUSrcB widened to two bits, so the old "1" becomes "01" and the old "0" becomes "00".

From spreadsheet to circuit¶

The decoder circuit pattern is unchanged from Part 2; you just have more of everything:

flowchart LR
    IW["Instruction Word"] --> SP["splitters:<br/>opcode, funct3, funct7"]
    SP --> CMP["comparators:<br/>one per instruction"]
    CMP --> PE["priority encoder<br/>-> INUM"]
    PE --> ROM["control ROM<br/>indexed by INUM"]
    ROM --> SPL["splitter:<br/>break ROM word into<br/>control lines"]
    SPL --> CTL["RFW, ALUOp, ALUSrcB,<br/>ALUSrcA, WDsel, PCsel"]

Two concrete changes versus Part 1:

More comparators / wider priority encoder. You add comparators that match opcode == 1101111 (jal) and opcode == 1100111 (jalr), and feed them into the priority encoder so they map to INUM 2 and 3.
Wider ROM word. Each ROM entry must now hold all the control bits. With RFW(1) + ALUOp(3) + ALUSrcB(2) + ALUSrcA(1) + WDsel(1) + PCsel(1) you need an 9-bit ROM word. The output splitter must extract each field at the correct bit positions — and the bit order must match the spreadsheet's "Output bits" column exactly.

8. Cycle-by-Cycle Trace of the Part 2 Program¶

Let us execute the Part 2 program one clock cycle at a time. Assume instruction memory is laid out at byte addresses 0, 4, 8, ... and unimp sits at address 12 (0xc), first_s at 16 (0x10).

addr  instruction
  0:  addi a0, zero, 1      (li a0,1)
  4:  addi a1, zero, 2      (li a1,2)
  8:  jal  ra, 16           (jal first_s ; offset = 16-8 = +8)
 12:  unimp
 16:  add  a0, a0, a1       (first_s)
 20:  jalr zero, ra, 0      (ret)

Cycle	PC	Instruction	What happens	a0	a1	ra	next PC
1	0	`addi a0,zero,1`	ALU: 0+1 → a0; WDsel=ALU; PCsel=PC+4	1	–	–	4
2	4	`addi a1,zero,2`	ALU: 0+2 → a1	1	2	–	8
3	8	`jal ra,16`	link: ra ← PC+4 = 12; ALU: PC+imm = 8+8 = 16; PCsel=target	1	2	12	16
4	16	`add a0,a0,a1`	ALU: 1+2 → a0	3	2	12	20
5	20	`jalr zero,ra,0`	link: x0 ← PC+4 (discarded); ALU: ra+0 = 12+0 = 12; PCsel=target	3	2	12	12
6	12	`unimp`	halt	3	2	12	–

Trace the two control-flow cycles carefully:

Cycle 3 (jal): ALUSrcA=1 puts PC (8) on the ALU A input; ALUSrcB=10 puts imm-J (8) on B; the ALU adds to get 16. Simultaneously WDsel=1 routes PC + 4 (12) to the register file write port, and RFW=1 writes it into ra. PCsel=1 makes the next PC the ALU target, 16.
Cycle 5 (ret = jalr): ALUSrcA=0 keeps ra (12) on the ALU A input; ALUSrcB=01 puts imm-I (0) on B; the ALU adds to get 12. WDsel=1 would write PC + 4 into the destination, but the destination is x0, so nothing observable changes. PCsel=1 makes the next PC the ALU target, 12 — which is the unimp right after the jal. The function has returned.

The program ends with a0 = 3, the sum computed inside first_s, and the PC correctly back at the unimp after the call.

9. Common Mistakes¶

These are the failure modes that show up most often in the lab and in the autograder.

[1] PC update uses the wrong source.
    Symptom: jal jumps to the right place but ret runs off into the weeds,
             or vice-versa.
    Cause:   For jal the ALU A input must be PC; for jalr it must be RD0 (rs1).
             A stuck ALUSrcA makes both jumps use the same A source.
    Fix:     ALUSrcA = 1 only for jal; = 0 for jalr (and everything else).

[2] Link writes the ALU result instead of PC+4.
    Symptom: ret returns to the jump *target* address (a loop) instead of
             the instruction after the call.
    Cause:   WDsel left at 0, so rd gets the ALU output (the target) not PC+4.
    Fix:     WDsel = 1 for both jal and jalr.

[3] PCsel never selects the target.
    Symptom: jal "executes" but the PC still just advances by 4; the function
             body is never entered.
    Cause:   PCsel = 0 for the jump rows, or the PCsel MUX inputs are swapped.
    Fix:     PCsel = 1 for jal and jalr; verify input 0 = PC+4, input 1 = target.

[4] Wrong immediate fed to the ALU.
    Symptom: jal lands at a wrong (but plausible) address.
    Cause:   ALUSrcB selects imm-I for jal (should be imm-J) or vice-versa.
    Fix:     jal -> imm-J (ALUSrcB=10); jalr -> imm-I (ALUSrcB=01).

[5] Forgot to backfill new control columns for old instructions.
    Symptom: addi/add suddenly break after you add the new MUXes.
    Cause:   ALUSrcA/WDsel/PCsel are "x" or 1 for the old rows, so an old
             instruction accidentally jumps or links.
    Fix:     Set every new control output to 0 for addi and add.

[6] ROM word too narrow or split at the wrong bit positions.
    Symptom: several control lines are wrong at once for every instruction.
    Cause:   ROM data width does not match the number of control bits, or the
             output splitter's bit order disagrees with the spreadsheet.
    Fix:     ROM data bits = total control bits; match the splitter order to
             the "Output bits" column exactly (low bit = leftmost column).

A good rule for #1, #2, and #3: when something jumps "almost right," the immediate or A-source is wrong; when a return goes wrong, suspect WDsel (you linked the wrong value).

10. Incremental Development and Debugging¶

The lab explicitly asks for an incremental approach with two committed top-level circuits, lab10-part1.dig and lab10-part2.dig. Here is the recommended workflow, which mirrors the 5-step pattern for adding any instruction.

flowchart TD
    A["1. Pick instruction(s):<br/>jal then jalr"] --> B["2. Add components:<br/>PCsel, WDsel, ALUSrcA MUXes"]
    B --> C["3. Extend datapath:<br/>wire PC, PC+4, imm-J"]
    C --> D["4. Extend decoder:<br/>new rows + columns"]
    D --> E["5. Test in Digital:<br/>single-step, watch dashboard"]
    E -->|"bug"| C
    E -->|"passes"| F["commit lab10-part2.dig"]

Practical debugging tips from the guides:

Paste the .dig test into your processor so you can run the autograder's test directly in Digital and watch where it diverges.
Single-step with objdump open. Keep the disassembly beside the simulation so you can match each executed instruction to its expected effect.
Add probes/tunnels to the dashboard for PC, PC+4, iw, RS1, imm-I, imm-J, the ALU result, and PCsel. When a jump misbehaves, these instantly show whether the immediate, the A-source, or the select line is at fault.
Use EN to control start. Press play, pick PROG, then toggle EN to 1 so the program starts cleanly with the PC at 0.
Watch ra. After the jal cycle, ra must equal the address of the unimp after the call. If it does not, you have a WDsel/link bug before you ever reach the ret.

Key Concepts¶

Concept	Definition	Example
Link register	Register that holds the return address saved by a call	`ra` (`x1`) after `jal`
Return address	Address of the instruction after a call (`PC + 4`)	`ra = 12` after `jal` at PC 8
JAL	J-type jump-and-link; `rd = PC+4`, `PC = PC + imm`	`jal ra, first_s`
JALR	I-type jump-and-link-register; `rd = PC+4`, `PC = rs1 + imm`	`jalr x0, ra, 0` (`ret`)
PC-relative	Target computed by adding offset to current PC	`jal`'s `PC + imm`
Register-relative	Target computed from a register value	`jalr`'s `rs1 + imm`
PCsel	MUX choosing `PC+4` vs. jump target as next PC	`1` for `jal`/`jalr`
WDsel	MUX choosing ALU result vs. `PC+4` as write data	`1` writes the link
ALUSrcA	MUX choosing `RD0` vs. `PC` for ALU A input	`1` for `jal` (PC + imm)
ret	`jalr x0, ra, 0`: discard link, jump to `ra`	function return

Practice Problems¶

Problem 1: What does `ret` actually do?¶

ret is a pseudo-instruction. Write the full jalr it expands to, and explain why each of its operands has the value it does.

Click to reveal solution

ret  ==  jalr x0, ra, 0

- `rd = x0`: A return must not change the link register. Writing `PC + 4` into `x0` discards it, because `x0` always reads as zero. (The handwritten note: "always 0 for ret".) - `rs1 = ra`: The return address was saved in `ra` by the `jal`/`call` that invoked the function. The PC update is `PC = rs1 + imm`, and we want to jump back to that saved address. - `imm = 0`: We want to land *exactly* on the return address with no offset. (The handwritten note: "imm is 0 for ret".) Net effect: `PC = ra + 0 = ra`. Control returns to the instruction after the call.

Problem 2: Compute a JAL target by hand¶

A jal ra, label instruction sits at byte address 0x40. Its J-type immediate decodes to imm = 0x20 (32 decimal). What value is written to ra, and what is the next PC?

Click to reveal solution

`jal` does two things: - **Link:** `ra = PC + 4 = 0x40 + 4 = 0x44` (68 decimal). - **Jump:** `PC = PC + imm = 0x40 + 0x20 = 0x60` (96 decimal). So `ra = 0x44` and the next instruction executed is at `0x60`. Note the target is computed from the *current* PC (`0x40`), not from `PC + 4`.

Problem 3: Trace the registers¶

Hand-execute this program and give the final values of a0, a1, and ra. Assume main starts at address 0 and instructions are 4 bytes apart.

main:
    li a0, 4
    li a1, 5
    jal first_s
    unimp
first_s:
    add a0, a0, a1
    add a0, a0, a1
    ret

Click to reveal solution

Addresses: `li a0` @0, `li a1` @4, `jal` @8, `unimp` @12, `first_s add` @16, second `add` @20, `ret` @24. | Step | Instruction | Effect | a0 | a1 | ra | |------|-------------|--------|----|----|----| | 1 | `li a0,4` | a0 = 4 | 4 | – | – | | 2 | `li a1,5` | a1 = 5 | 4 | 5 | – | | 3 | `jal first_s` @8 | ra = 8+4 = 12; PC → 16 | 4 | 5 | 12 | | 4 | `add a0,a0,a1` | a0 = 4+5 = 9 | 9 | 5 | 12 | | 5 | `add a0,a0,a1` | a0 = 9+5 = 14 | 14 | 5 | 12 | | 6 | `ret` | PC → ra = 12 | 14 | 5 | 12 | | 7 | `unimp` @12 | halt | 14 | 5 | 12 | Final: `a0 = 14`, `a1 = 5`, `ra = 12`.

Problem 4: Fill in the decoder row¶

Click to reveal solution

jalr:  RFW=1  ALUOp=000  ALUSrcB=01  ALUSrcA=0  WDsel=1  PCsel=1

- `RFW=1`, `WDsel=1`: write the link (`PC + 4`) into `rd`. (For `ret`, `rd = x0` so the write is discarded, but the control line is still 1.) - `ALUOp=000`: the ALU adds to form `rs1 + imm`. - `ALUSrcB=01`: select the I-type immediate (it is 0 for `ret`). - `ALUSrcA=0`: keep `RD0` (the value of `rs1`, e.g. `ra`) on the ALU A input — **not** PC. This is the key difference from `jal`, which uses `ALUSrcA=1`. - `PCsel=1`: take the computed target as the next PC.

Problem 5: Diagnose the bug¶

A student's processor runs the Part 2 program but loops forever: after first_s runs, ret jumps back to first_s instead of to the unimp. Which control line is most likely wrong, and why?

Click to reveal solution

The likely culprit is **`WDsel`** for the `jal` row (a link bug), not anything in the `ret` row. If `jal`'s `WDsel = 0`, then during the `jal` cycle the register file's write data is the **ALU result** (the jump target, `first_s`'s address) instead of `PC + 4`. So `ra` ends up holding the address of `first_s`, not the address of the `unimp` after the call. Later, `ret` faithfully does `PC = ra + 0`. But `ra` is wrong — it points at `first_s` — so the processor re-enters the function and loops. Diagnosis tip: after the `jal` cycle, look at `ra` on the dashboard. It should equal the address of the instruction after the `jal` (the `unimp`). If it equals the function entry address, fix `WDsel` for `jal`.

Problem 6: Why can one instruction implement both `call` and `j`?¶

Explain how jal serves as both the function-call primitive (call) and the unconditional-jump primitive (j), even though one needs a return address and the other does not.

Click to reveal solution

`jal rd, label` always does two things: it writes `PC + 4` into `rd` and sets `PC = PC + imm`. The flexibility is entirely in the choice of `rd`: - **`call label` = `jal ra, label`**: link into `ra`, so the return address is preserved and a later `ret` can come back. - **`j label` = `jal zero, label`**: link into `x0`. Since `x0` always reads as zero, writing `PC + 4` to it discards the return address. The result is a pure jump with no usable link. So the same datapath — same `WDsel`, same `PCsel`, same ALU `PC + imm` — implements both. The only difference is the destination register field in the instruction word, which the register-write logic already handles (a write to `x0` is a no-op). No extra hardware is needed to distinguish a call from a goto.

Summary¶

Functions need control flow plus a return address. A jump that forgets where it came from is a one-way trip; the link (saving PC + 4) is what makes a return possible.
jal (jump and link) is the call primitive. It is J-type: rd = PC + 4 and PC = PC + imm (PC-relative). call and j are pseudo-instructions built on it.
jalr (jump and link register) is the return primitive. It is I-type: rd = PC + 4 and PC = rs1 + imm (register-relative). ret is jalr x0, ra, 0 — discard the link, jump to ra.
Both instructions link; they differ only in how the PC target is formed. jal adds the immediate to PC; jalr adds the immediate to a register. The same ALU add computes both targets.
The datapath grows by three MUXes. PCsel chooses PC+4 vs. target, WDsel chooses ALU result vs. PC+4 for the link, and ALUSrcA chooses RD0 vs. PC for the ALU A input; ALUSrcB widens to also select imm-J.
The decoder spreadsheet adds two rows and three columns. Set the new control outputs to 0 for the existing addi/add rows so Part 1 keeps working, and widen the control ROM to hold every control bit.
Most bugs are control-line mix-ups. A bad ALUSrcA or ALUSrcB makes a jump land in the wrong place; a bad WDsel makes ret return to the wrong address; a bad PCsel makes the jump not happen at all.
Develop incrementally and single-step with the dashboard. Commit lab10-part1.dig and lab10-part2.dig separately, keep objdump open beside the simulation, and watch ra, the PC, and the immediates to localize any failure fast.

Lab: Processor JAL and JALR¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. Why We Need Jumps at All¶

2. JAL: Jump And Link (the call primitive)¶

The call and j pseudo-instructions¶

J-type encoding¶

3. JALR: Jump And Link Register (the ret primitive)¶

The ret pseudo-instruction¶

4. JAL vs JALR Side by Side¶

5. The Lab 10 Programs¶

Building the .hex and checking the encoding¶

6. New Datapath Components¶

How each instruction computes its target in the ALU¶

Datapath sketch¶

7. Extending the Instruction Decoder¶

From spreadsheet to circuit¶

8. Cycle-by-Cycle Trace of the Part 2 Program¶

9. Common Mistakes¶

10. Incremental Development and Debugging¶

Key Concepts¶

Practice Problems¶

Problem 1: What does ret actually do?¶

Problem 2: Compute a JAL target by hand¶

Problem 3: Trace the registers¶

Problem 4: Fill in the decoder row¶

Problem 5: Diagnose the bug¶

Problem 6: Why can one instruction implement both call and j?¶

Further Reading¶

Summary¶

2. JAL: Jump And Link (the `call` primitive)¶

The `call` and `j` pseudo-instructions¶

3. JALR: Jump And Link Register (the `ret` primitive)¶

The `ret` pseudo-instruction¶

Building the `.hex` and checking the encoding¶

Problem 1: What does `ret` actually do?¶

Problem 6: Why can one instruction implement both `call` and `j`?¶