← Back to Course
# RISC-V Assembly Part 5: Functions and Recursion ## CS 315 Computer Architecture --- ## Learning Objectives - Explain how `call` and `ret` use `ra` to transfer control - Classify every register as **caller-saved** or **callee-saved** - Write correct function **prologues and epilogues** - Implement non-leaf functions with both strategies - Trace **recursive** execution through nested stack frames - Translate compound conditionals into RISC-V branches --- ## How `call` and `ret` Work ```asm call target # expands to: jal ra, target # ra = PC + 4 (address after the call) # PC = target (jump into the function) ret # expands to: jalr zero, ra, 0 # PC = ra (jump back to caller) ```
call
saves the return address in
ra
, then jumps.
ret
just sets
PC = ra
. There is no magic.
--- ## The call/ret Mechanism
flowchart LR A["foo\n0x1000: call bar\n(ra = 0x1004)"] -->|"ra = PC+4, jump to bar"| B["bar\n0x2000: add a0,a0,a1\n..."] B -->|"ret: PC = ra = 0x1004"| C["foo\n0x1004: addi a0,a0,1"] style B fill:#e8f5e9,stroke:#00543c,stroke-width:2px
- `call bar` at `0x1000` sets `ra = 0x1004` and jumps to `0x2000` - `ret` in `bar` sends control to `0x1004` (instruction after the call) --- ## The `ra` Danger
Problem:
Every
call
overwrites
ra
.
If
foo
was itself called and then calls
bar
, the original
ra
is destroyed.
When
foo
runs
ret
, it jumps back into itself —
infinite loop
.
**Fix:** Save `ra` on the stack in the prologue before any call. --- ## Caller-Saved vs Callee-Saved | Class | Registers | Preserved by | Survives a call? | |-------|-----------|--------------|-----------------| | Arguments / return | `a0`–`a7` | Caller | No | | Temporaries | `t0`–`t6` | Caller | No | | Return address | `ra` | Caller | No | | Saved | `s0`–`s11` | Callee | Yes | | Stack pointer | `sp` | Callee | Yes | --- ## Register Classes at a Glance
flowchart LR subgraph Caller["Caller-Saved (may be clobbered)"] A["a0-a7\nargs / return values"] T["t0-t6\ntemporaries"] RA["ra\nreturn address"] end subgraph Callee["Callee-Saved (always preserved)"] S["s0-s11\nsaved registers"] SP["sp\nstack pointer"] end
Save
a
/
t
registers
before
a call if you need them after.
Save
s
registers in the
prologue
if your function uses them.
--- ## The Stack - Stack **grows downward** — allocate by subtracting from `sp` - **`sp` must always be a multiple of 16** - Never write below `sp` ```asm addi sp, sp, -16 # allocate 16 bytes (sp moves DOWN) # ... use the space ... addi sp, sp, 16 # free 16 bytes (sp moves back UP) ``` Even if you only need 8 bytes (for `ra`), allocate 16. --- ## Prologue / Epilogue Pattern ```asm foo: addi sp, sp, -16 # prologue: allocate frame sd ra, 0(sp) # save ra # ... body, including calls ... ld ra, 0(sp) # epilogue: restore ra addi sp, sp, 16 # free frame ret ```
Leaf functions
(no calls, no s-regs used) need no frame at all.
--- ## Stack Frame Layout ```text Higher addresses +---------------------------+ <- sp before prologue | ...caller's frame... | +---------------------------+ <- sp (after addi sp, sp, -16) | ra (saved return addr) | sp + 0 | (padding / locals) | sp + 8 +---------------------------+ <- sp - 16 Lower addresses ``` Use `sd`/`ld` (8 bytes) for `ra` and `s` registers. Use `sw`/`lw` (4 bytes) for `int` values. --- ## Choosing Load/Store Width | Data | Store | Load (signed) | |------|-------|---------------| | 1 byte (`char`) | `sb` | `lb` | | 4 bytes (`int`) | `sw` | `lw` | | 8 bytes (pointer, `long`) | `sd` | `ld` | Always use `sd`/`ld` to save/restore registers (`ra`, `s0`, etc.). --- ## Leaf Function Example ```asm # max of two ints — no calls, so no frame needed .global max2_s max2_s: bge a0, a1, max2_done # if a0 >= a1, a0 is already max mv a0, a1 # else max = a1 max2_done: ret ``` No `addi sp` needed — `ra` is never overwritten. --- ## Non-Leaf: The Problem ```c int add4f(int a, int b, int c, int d) { return add2(add2(a, b), add2(c, d)); } ``` `add4f` must call `add2` three times. After each call, `a0`/`a1` (and `ra`) are overwritten. We must **save** what we need before each call. --- ## Strategy 1: Caller-Saved Approach Park live values **on the stack** around each call. ```asm add4f_s: addi sp, sp, -32 sd ra, 0(sp) sd a2, 8(sp) # save c sd a3, 16(sp) # save d call add2_s # a0 = a + b sd a0, 24(sp) # save tmp = a+b ld a0, 8(sp) # a0 = c ld a1, 16(sp) # a1 = d call add2_s # a0 = c + d mv a1, a0 ld a0, 24(sp) # a0 = a+b call add2_s # a0 = (a+b)+(c+d) ld ra, 0(sp) addi sp, sp, 32 ret ``` --- ## Strategy 2: Callee-Saved Approach Move values into `s` registers — they survive calls automatically. ```asm add4f_callee_s: addi sp, sp, -32 sd ra, 0(sp) sd s0, 8(sp) # save s0-s2 (we will use them) sd s1, 16(sp) sd s2, 24(sp) mv s0, a2 # s0 = c mv s1, a3 # s1 = d call add2_s # a0 = a+b mv s2, a0 # s2 = a+b mv a0, s0 ; mv a1, s1 call add2_s # a0 = c+d mv a1, a0 ; mv a0, s2 call add2_s # a0 = (a+b)+(c+d) ld ra, 0(sp) ; ld s0, 8(sp) ld s1, 16(sp) ; ld s2, 24(sp) addi sp, sp, 32 ret ``` --- ## Comparing the Two Strategies | Aspect | Caller-Saved | Callee-Saved | |--------|-------------|--------------| | Where values live | Stack | `s` registers | | Save/restore timing | Around each call | Once (prologue/epilogue) | | Stack accesses | More | Fewer | | Best when | One or two calls | Values survive many calls | Both are **correct**. Callee-saved is usually cleaner for multiple calls. --- ## Recursion: The Key Insight
A recursive function is
just a function that calls itself
.
The same rules apply: save
ra
, save caller-saved values you need after the call.
What makes it interesting: **each invocation gets its own stack frame** with its own copy of `n` and its own `ra`. --- ## Recursive Factorial in C ```c int factrec(int n) { if (n <= 0) { return 1; // base case } else { return n * factrec(n-1); // recursive step } } ``` Two things must survive the recursive call: 1. **`ra`** — every `call` overwrites it 2. **`n`** — the recursive call overwrites `a0` with `n-1` --- ## `factrec_s` in Assembly ```asm .global factrec_s # a0 = n factrec_s: addi sp, sp, -16 sd ra, 0(sp) # save return address bgt a0, zero, factrec_rec # if n > 0, recurse li a0, 1 # base case: return 1 j factrec_done factrec_rec: sd a0, 8(sp) # save this frame's n addi a0, a0, -1 # a0 = n - 1 call factrec_s # a0 = factrec(n-1) ld t0, 8(sp) # restore n mul a0, a0, t0 # a0 = factrec(n-1) * n factrec_done: ld ra, 0(sp) addi sp, sp, 16 ret ``` --- ## Recursive Expansion: `factrec(4)`
flowchart TD F4["factrec(4): n=4"] --> F3["factrec(3): n=3"] F3 --> F2["factrec(2): n=2"] F2 --> F1["factrec(1): n=1"] F1 --> F0["factrec(0): base case → 1"] F0 -->|"1"| R1["1 × 1 = 1"] R1 -->|"1"| R2["2 × 1 = 2"] R2 -->|"2"| R3["3 × 2 = 6"] R3 -->|"6"| R4["4 × 6 = 24"] style F0 fill:#fff3cd,stroke:#FDBB30,stroke-width:2px style R4 fill:#e8f5e9,stroke:#00543c,stroke-width:2px
--- ## Stack Growth During Recursion ```text Higher addresses +---------------------------+ | factrec(4): ra | n=4 | <- sp4 +---------------------------+ | factrec(3): ra | n=3 | <- sp3 +---------------------------+ | factrec(2): ra | n=2 | <- sp2 +---------------------------+ | factrec(1): ra | n=1 | <- sp1 +---------------------------+ | factrec(0): ra | (base) | <- sp0 (deepest) +---------------------------+ Lower addresses ``` Each frame is 16 bytes. The `mul` happens on the way **back up**. --- ## Execution Trace | Call | n | Saves | Returns | |------|---|-------|---------| | `factrec_s(4)` | 4 | ra, n=4 | 4 × 6 = **24** | | `factrec_s(3)` | 3 | ra, n=3 | 3 × 2 = **6** | | `factrec_s(2)` | 2 | ra, n=2 | 2 × 1 = **2** | | `factrec_s(1)` | 1 | ra, n=1 | 1 × 1 = **1** | | `factrec_s(0)` | 0 | ra | **1** (base) | --- ## Tail Recursion `factrec` is **not** tail recursive — `mul` happens after the call. **Tail recursive** version with accumulator: ```asm .global factacc_s # a0 = n, a1 = acc (start with acc=1) factacc_s: bgt a0, zero, factacc_step mv a0, a1 # base case: return acc ret factacc_step: mul a1, a1, a0 # acc = acc * n addi a0, a0, -1 # n = n - 1 j factacc_s # tail call → plain jump, no new frame! ```
Tail calls become a
j
(jump). Constant stack space — no frame needed.
--- ## Translating Compound Conditionals ```c if (lo <= x && x <= hi) { /* in range */ } ``` **Strategy:** branch **away** when the condition is false. ```asm # a0 = x, a1 = lo, a2 = hi blt a0, a1, out_of_range # x < lo -> fail blt a2, a0, out_of_range # hi < x -> fail # ---- in range ---- j range_end out_of_range: # ---- out of range ---- range_end: ``` Use **inverse comparisons**: branch when any part fails, fall through when all pass. --- ## Branch Instruction Reference | C comparison | RISC-V branch | Notes | |--------------|---------------|-------| | `x < y` | `blt x, y, L` | | | `x <= y` | `bge y, x, L` | swap operands | | `x > y` | `blt y, x, L` | swap operands | | `x >= y` | `bge x, y, L` | | | `x == y` | `beq x, y, L` | | | `x != y` | `bne x, y, L` | | `bgt` and `ble` are pseudo-instructions (assembler swaps operands). --- ## Fibonacci with Two Recursive Calls ```asm .global fibrec_s # a0 = n; fib(0)=0, fib(1)=1, fib(n)=fib(n-1)+fib(n-2) fibrec_s: addi sp, sp, -16 sd ra, 0(sp) li t0, 2 bge a0, t0, fib_rec # if n >= 2, recurse; else return n j fib_done fib_rec: sd a0, 8(sp) # save n addi a0, a0, -1 call fibrec_s # a0 = fib(n-1) ld t0, 8(sp) # restore n sd a0, 8(sp) # reuse slot: save fib(n-1) addi a0, t0, -2 call fibrec_s # a0 = fib(n-2) ld t0, 8(sp) # t0 = fib(n-1) add a0, a0, t0 # a0 = fib(n-2) + fib(n-1) fib_done: ld ra, 0(sp) addi sp, sp, 16 ret ``` --- ## Debugging Recursion with GDB ```bash gdb ./fibrec (gdb) break factrec_s (gdb) run 4 (gdb) stepi # step one instruction (gdb) info registers a0 ra sp # watch key registers (gdb) x/4gx $sp # examine 4 doublewords at sp (gdb) backtrace # see all recursive frames ``` Watch `sp` shrink by 16 on each recursive entry and grow back by 16 on each return — this makes the stack diagram concrete. --- ## Key Concepts Recap | Concept | Key Idea | |---------|----------| | `call`/`ret` | `ra` links caller and callee | | Caller-saved | `a0-a7`, `t0-t6`, `ra` — may be clobbered | | Callee-saved | `s0-s11`, `sp` — must be preserved | | Prologue/Epilogue | Allocate frame, save `ra`/`s` regs, restore, free | | Stack alignment | `sp` always a multiple of 16 | | Leaf function | No calls, no frame needed | | Recursion | Same rules; one frame per call | | Tail recursion | Recursive call last → jump, constant stack space | --- ## Common Bugs Checklist
Forgot to save
ra
?
→ infinite loop on
ret
Frame not 16-byte aligned?
→ crashes on aligned memory ops
Forgot to save
n
before recursive call?
→ wrong answer (multiply wrong value)
Rule:
save everything you need
after
a call before you make it.
--- ## Summary 1. `call` saves `PC+4` in `ra` and jumps; `ret` sets `PC = ra` 2. **Calling convention:** caller-saved (`a`, `t`, `ra`) vs callee-saved (`s`, `sp`) 3. **Prologue/epilogue:** allocate 16-byte-aligned frame, save `ra` (and `s` regs), restore, free 4. **Two non-leaf strategies:** caller-saved (stack around calls) vs callee-saved (`s` regs once) 5. **Recursion = function calling itself** — each call gets its own frame; `mul` happens on unwind 6. **Tail recursion** — last action is the call → becomes a `j`, no new frame 7. **Compound conditionals** — branch away on failure (inverse comparison)