← Back to Course
# Lab: RISC-V Assembly Part 4 — Functions and the Stack ## CS 315 Computer Architecture --- ## Learning Objectives - Explain what `call` and `ret` do to `pc` and `ra` - Describe why nested calls require saving `ra` - Allocate and deallocate stack space correctly - Apply the 16-byte stack-alignment rule - Distinguish **caller-saved** from **callee-saved** registers - Write a correct prologue and epilogue for a non-leaf function - Use `gdb` to verify the stack is balanced --- ## Leaf vs. Complex Functions **Leaf function** — calls nothing, needs no frame: ```text add_two: add a0, a0, a1 ret ``` **Complex (non-leaf) function** — calls other functions: - `call` overwrites `ra` — our own return address is lost - Caller-saved registers can be destroyed by the callee
Any function that calls another function must save its own
ra
on the stack.
--- ## How `call` and `ret` Work
sequenceDiagram participant foo as foo (caller) participant bar as bar (callee) Note over foo: add, addi, ... foo->>bar: call bar (ra = pc+4, pc = addr of bar) Note over bar: mul, add, ... bar-->>foo: ret (pc = ra) Note over foo: continues after the call
--- ## `call` and `ret` Instructions | Instruction | Pseudo for | Effect | |-------------|-----------|--------| | `call bar` | `jal ra, bar` | `ra = pc + 4`, then `pc = addr(bar)` | | `ret` | `jalr x0, ra, 0` | `pc = ra` |
call
: saves the return point in
ra
, then jumps.
ret
: copies
ra
back into
pc
. That's all.
--- ## The `ra` Clobber Problem There is only **one** `ra` register:
flowchart LR M[main] -->|"call foo\nra = back into main"| F[foo] F -->|"call bar\nra = back into foo"| B[bar] B -.->|"ret: pc = ra"| F F -.->|"ret: pc = ra"| M
If `foo` never saves its original `ra`, the second `call` **overwrites** the address back to `main`. `foo`'s `ret` jumps into `foo` again — **infinite loop**. --- ## The Stack: Layout and Direction ```text +-----------------+ higher addresses | caller's frame | sp -> +-----------------+ <-- top BEFORE allocation | slot 3 (8B) | +-----------------+ | slot 2 (8B) | +-----------------+ | slot 1 (8B) | +-----------------+ | slot 0 (8B) | sp -> +-----------------+ <-- top AFTER allocation | (free) | lower addresses ``` - `sp` points to the **lowest in-use address** - Stack **grows downward** — subtract to allocate, add to deallocate --- ## Stack Allocation and Deallocation ```text # Prologue: make room addi sp, sp, -16 # ... use stack via offsets from sp ... sd ra, 0(sp) sd s0, 8(sp) # Epilogue: give it back ld s0, 8(sp) ld ra, 0(sp) addi sp, sp, 16 ```
Golden rule: whatever you subtract on entry, you
must
add back on exit. A mismatch corrupts the stack for every caller above you.
--- ## The 16-Byte Alignment Rule `sp` must be a **multiple of 16** at every call boundary. | Bytes needed | Round up to | `addi` on entry | |-------------|-------------|-----------------| | 8 (just `ra`) | 16 | `addi sp, sp, -16` | | 16 (`ra` + one `s`) | 16 | `addi sp, sp, -16` | | 24 (`ra` + two `s`) | 32 | `addi sp, sp, -32` | | 40 | 48 | `addi sp, sp, -48` | On RV64, pointers and `ra` are **8 bytes** (doubleword). Use `sd`/`ld`, not `sw`/`lw`. --- ## RISC-V Register Map | ABI name | x-name | Purpose | |----------|--------|---------| | `zero` | `x0` | Always 0 | | `ra` | `x1` | Return address | | `sp` | `x2` | Stack pointer | | `a0–a7` | `x10–x17` | Arguments and return values | | `t0–t6` | `x5–x7, x28–x31` | Temporaries | | `s0–s11` | `x8–x9, x18–x27` | Saved registers | --- ## Caller-Saved vs. Callee-Saved
flowchart TD subgraph Caller["Caller saves before a call (if needed after)"] C1["a0-a7 (args / returns)"] C2["t0-t6 (temporaries)"] C3["ra (return address)"] end subgraph Callee["Callee saves on entry, restores before ret"] D1["s0-s11 (saved registers)"] D2["sp (stack pointer)"] end Caller -.->|spill to stack| Stack[(Stack)] Callee -.->|spill to stack| Stack
--- ## Caller vs. Callee: Summary Table | Class | Registers | Preserved by | When to save | |-------|-----------|--------------|-------------| | Temporaries | `t0–t6` | Caller | Before a call, only if needed after | | Args/returns | `a0–a7` | Caller | Before a call, only if needed after | | Return address | `ra` | Caller | In any non-leaf function | | Saved | `s0–s11` | Callee | On entry, only if function uses them | | Stack pointer | `sp` | Callee | Always balanced via matching add/sub | --- ## The Prologue / Epilogue Template Stack frame layout (saving `ra` and `s0`): ```text +-----------------+ sp + 8 (old sp was here) | s0 | callee-saved reg (8 bytes) +-----------------+ sp + 0 (new top) | ra | return address (8 bytes) +-----------------+ ``` ```text foo: addi sp, sp, -16 # prologue sd ra, 0(sp) sd s0, 8(sp) # ... body, may call other functions ... ld s0, 8(sp) # epilogue ld ra, 0(sp) addi sp, sp, 16 ret ``` --- ## `foo` Calls `bar`: Control Flow ```text foo: addi sp, sp, -16 bar (leaf): sd ra, 0(sp) add a0, a0, a1 sd s0, 8(sp) ret (pc = ra) . call bar ---------------> . <--------------- . ld s0, 8(sp) ld ra, 0(sp) addi sp, sp, 16 ret (pc = ra of foo's caller) ```
bar
is a leaf — no frame needed.
foo
is complex — full prologue/epilogue required.
--- ## Worked Example: `add4` Using `add2` ```c int add2(int x, int y) { return x + y; } int add4(int a, int b, int c, int d) { int t = add2(a, b); t = add2(t, c); t = add2(t, d); return t; } ``` **The trap:** `add2` clobbers `a0`/`a1`. After the first call, `c` and `d` (in `a2`, `a3`) survive — but `a2`/`a3` are caller-saved so they may also be gone after a call. **Fix:** Copy them into callee-saved `s0`, `s1` in the prologue. --- ## `add4` Assembly ```text add4: addi sp, sp, -32 # ra + s0 + s1 = 24 -> round to 32 sd ra, 0(sp) sd s0, 8(sp) sd s1, 16(sp) mv s0, a2 # s0 = c (survives calls) mv s1, a3 # s1 = d (survives calls) call add2 # a0 = a + b mv a1, s0 call add2 # a0 = (a+b) + c mv a1, s1 call add2 # a0 = (a+b+c) + d ld s1, 16(sp) ld s0, 8(sp) ld ra, 0(sp) addi sp, sp, 32 ret ``` --- ## Workflow: Logic First, Then Preservation 1. **Write correct logic first** — use registers freely, get the math right 2. **Identify what must survive calls** — anything needed after a `call` 3. **Move those values to `s` registers** — callee-saved, so calls can't destroy them 4. **Write prologue / epilogue** — save/restore used `s` regs and `ra` 5. **Check alignment** — frame size must be a multiple of 16
One prologue + one epilogue, not one per call site.
--- ## Checklist Before You Compile ```text [ ] Does this function call anything? If yes, save ra. [ ] Which s* registers do I use? Save/restore each. [ ] What values must survive across calls? Move to s registers. [ ] Is frame size a multiple of 16? [ ] Does every addi sp, sp, -N have a matching addi sp, sp, N? [ ] Is the return value in a0 before ret? ``` --- ## Bug Pattern: Forgetting to Save `ra` ```text # BROKEN: non-leaf with no ra save mysum: call helper # ra now points INSIDE mysum add a0, a0, t0 ret # jumps back into mysum -> infinite loop! ``` **Fix:** Add prologue/epilogue to save and restore `ra`. --- ## Bug Pattern: Mismatched Stack / Caller-Saved Reg ```text # BROKEN: allocate 16, deallocate 32 -> sp wrong on return foo: addi sp, sp, -16 ... addi sp, sp, 32 # sp now 16 bytes too high ret ``` ```text # BROKEN: t0 is caller-saved, may be gone after call li t0, 42 call something # something may overwrite t0 add a0, a0, t0 # t0 might not be 42 anymore! ``` **Fix for t0:** Keep the value in a callee-saved `s` register. --- ## Lab03 Connection: `min` (Leaf) ```text .global min # a0 = a, a1 = b ; return smaller in a0 min: blt a0, a1, done # if a < b, a0 is already correct mv a0, a1 # else answer is b done: ret ``` - No calls, no frame needed - Conditional-move pattern: pick one of two values via a branch --- ## Lab03 Connection: `find_max` (Leaf Version) ```text .global find_max # a0 = int *arr, a1 = int n ; return max in a0 find_max: lw t0, (a0) # t0 = max = arr[0] li t1, 1 # i = 1 loop: bge t1, a1, done slli t2, t1, 2 add t2, a0, t2 lw t3, (t2) # t3 = arr[i] bge t0, t3, skip mv t0, t3 skip: addi t1, t1, 1 j loop done: mv a0, t0 ret ``` --- ## `find_max` Built on `max2` Helper (Non-Leaf) ```text find_max: addi sp, sp, -32 sd ra, 0(sp) ; sd s0, 8(sp) ; sd s1, 16(sp) ; sd s2, 24(sp) mv s0, a0 # base ptr mv s1, a1 # n lw s2, (s0) # running max = arr[0] li t0, 1 # i = 1 fm_loop: bge t0, s1, fm_done slli t1, t0, 2 ; add t1, s0, t1 ; lw a1, (t1) mv a0, s2 ; mv s3, t0 # park i in s3 across call call max2 mv s2, a0 ; addi t0, s3, 1 j fm_loop fm_done: mv a0, s2 ld s2, 24(sp) ; ld s1, 16(sp) ; ld s0, 8(sp) ; ld ra, 0(sp) addi sp, sp, 32 ret ``` --- ## Debugging with `gdb` ```bash gcc -g -static -o find_max main.c find_max.s gdb ./find_max ``` ```text (gdb) break find_max (gdb) run (gdb) info registers sp ra # note entry values (gdb) stepi # one instruction at a time (gdb) x/4xg $sp # examine 4 doublewords at stack top ``` --- ## What to Verify in `gdb`
flowchart TD A["Entry: record sp and ra"] --> B["Prologue: sp -= N, save ra/s*"] B --> C["Body and calls: ra is clobbered here"] C --> D["Epilogue: restore ra/s*, sp += N"] D --> E["ret: pc = ra"] E --> F{"sp == entry sp?\nra == entry ra?"} F -->|yes| G["Correct return"] F -->|no| H["Stack or ra bug\nfix prologue/epilogue"]
--- ## Key Concepts Reference | Concept | One-line definition | |---------|---------------------| | `call bar` | `ra = pc+4`, then `pc = addr(bar)` | | `ret` | `pc = ra` | | Leaf function | Calls nothing; no frame needed | | Complex function | Calls others; must save `ra` and build frame | | Stack frame | Block of stack carved out with `addi sp, sp, -N` | | 16-byte alignment | `sp` multiple of 16 at every call boundary | | Caller-saved | `a0–a7`, `t0–t6`, `ra` — may be destroyed by a call | | Callee-saved | `s0–s11`, `sp` — preserved across any call | | Doubleword (DW) | 8 bytes; use `sd`/`ld` for registers on RV64 | --- ## Summary 1. **`call`** sets `ra = pc+4` and jumps; **`ret`** sets `pc = ra`. 2. One `ra` register — any non-leaf function must **save and restore it**. 3. Stack **grows downward**; allocate with `-N`, deallocate with `+N` — they must match exactly. 4. Keep `sp` **16-byte aligned** at call boundaries — round bytes needed up to a multiple of 16. 5. **Caller-saved** (`a0–a7`, `t0–t6`, `ra`) may be clobbered; **callee-saved** (`s0–s11`) are preserved. 6. Use `s` registers to hold values that must survive calls; save/restore them in the prologue/epilogue. 7. Verify with `gdb`: `sp` and `ra` must return to their entry values after the epilogue.