Lab: RISC-V Assembly Part 4 — Functions and the Stack¶
Overview¶
This hands-on lab session moves from writing single, self-contained ("leaf") functions to writing complex functions — functions that call other functions. To do this safely on RISC-V we have to understand three intertwined mechanisms: how the call and ret instructions move control between functions using the program counter (pc) and return address (ra); how the stack gives each function scratch space that survives across calls; and the calling convention that decides which registers a caller must protect and which a callee must protect. We build up a complete prologue/epilogue template, walk through it in gdb, and connect everything back to the min and find_max problems in Lab03.
Learning Objectives¶
- Explain what the
callandretinstructions do topcandraat the hardware level - Describe why nested function calls require saving and restoring the return address (
ra) - Allocate and deallocate stack space correctly using
addi sp, sp, -N/addi sp, sp, N - Apply the 16-byte stack-alignment rule and explain why it matters
- Distinguish caller-saved from callee-saved registers and decide which to spill
- Write a correct prologue and epilogue for a non-leaf function
- Use
gdbto single-step through a call/return and verify the stack pointer is balanced - Convert a working C function with helper calls into correct, ABI-compliant RISC-V assembly
Prerequisites¶
- RISC-V instruction basics:
add,addi,li,mul, branches,j, labels (Lab02 / Assembly Parts 1–3) - Memory instructions:
ld/sd(doubleword),lw/sw(word), and base+offset addressing - The fetch/decode/execute cycle and the role of the
pc - C functions, arguments, return values, and the notion of a call stack
- Comfort building and running programs with
makeand stepping withgdb
1. From Leaf Functions to Complex Functions¶
In the earlier assembly labs every function was a leaf function: it did some arithmetic and returned, without calling anything else. A leaf function is easy — it never touches ra after entry and usually needs no stack frame at all:
A complex function (the instructor's term for a non-leaf function) is one that calls at least one other function. The moment a function makes a call, two new problems appear:
- The
callinstruction overwritesra. If we are inside a function that was itself called, our own return address gets clobbered. - Any value we are keeping in a caller-saved register can be destroyed by the function we call.
The rest of this lab is about solving those two problems systematically. The diagram below is the mental model the instructor drew: a caller foo transferring control to a callee bar and getting control back.
sequenceDiagram
participant foo as foo (caller)
participant bar as bar (callee)
Note over foo: pc advances: add, addi (pc = pc + 4)
foo->>bar: call bar (ra = pc+4, pc = addr of bar)
Note over bar: mul, add, ...
bar-->>foo: ret (pc = ra)
Note over foo: continues at instruction after the call
The key insight: a function call is just a controlled jump that remembers where to come back to. That "where to come back to" lives in ra.
2. The Program Counter and Return Address¶
Two registers drive all control flow:
| Register | ABI name | x-name | Role |
|---|---|---|---|
| Program counter | pc |
— | Address of the instruction currently being executed; normally advances by 4 each instruction |
| Return address | ra |
x1 |
Where control should resume after the current function returns |
Normal straight-line execution simply does pc = pc + 4 after each 4-byte instruction. A branch or jump sets pc to some other address. A function call and return are built from these primitives.
What call does¶
call bar is a pseudo-instruction for jal ra, bar (jump and link). It performs two updates in one instruction:
call:
1) ra = pc + 4 # remember the instruction AFTER the call
2) pc = addr of bar # jump to the target function (its label)
Step 1 is the "link" part: it stashes the address of the next instruction into ra so the callee knows where to return. Step 2 is the "jump" part: control transfers to the first instruction of bar.
What ret does¶
ret is a pseudo-instruction for jalr x0, ra, 0 (jump to the address in ra, discarding any link). It performs a single update:
So ret just copies ra into pc. That is the entire return mechanism. If ra holds the right address, you return to the right place. If something corrupted ra, you return to the wrong place — often an infinite loop or a crash.
caller foo: callee bar:
pc -> add pc -> mul
addi (pc = pc + 4) add
. .
. .
pc -> call bar ---------------------> .
ra = pc + 4 .
. <------------------- ret (pc = ra)
.
ret (pc = ra of foo's caller)
Mapping to the handwritten diagram: the long arrow from
call barover tomulis step 2 ofcall(pc jumps tobar). The arrow coming back frombar'sretto just belowcall barisretsettingpc = ra.
3. Why Nested Calls Break: The ra Clobber Problem¶
Here is the trap. There is exactly one ra register. Suppose main calls foo, and foo calls bar:
flowchart LR
M[main] -->|call foo<br/>ra = return into main| F[foo]
F -->|call bar<br/>ra = return into foo| B[bar]
B -.ret -> pc = ra.-> F
F -.ret -> pc = ra.-> M
Trace what happens to ra if foo does not protect it:
main: call foo # ra = (address in main, just after this call)
foo: ...
call bar # ra = (address in foo, just after this call) <-- OVERWRITES main's ra!
...
ret # pc = ra -> returns correctly into foo's body... wait, no:
After foo executes call bar, the original ra (the address back into main) is gone — it was overwritten with the address inside foo. When bar later does ret, it correctly returns into foo. But when foo then does ret, ra still points back into foo (or wherever the last call left it), not into main. The program returns to the wrong place. A common symptom is an infinite loop: foo returns into itself.
The fix: because ra is a caller-saved register (Section 6), any function that makes a call must save its own ra before that call and restore it before ret. The only safe place to keep it is the stack, because the stack gives every function its own private storage.
This is exactly why leaf functions need no stack frame (they never call anything, so their ra is never clobbered) while complex functions almost always do.
4. The Stack: Layout and the Stack Pointer¶
The stack is a region of memory used for temporary, per-function storage: saved registers, the return address, and local variables that do not fit in registers. It is managed entirely through the stack pointer, sp (x2).
Direction and the stack pointer¶
spalways points to the current top of the stack — the lowest in-use address.- The stack grows downward (toward smaller addresses). To make room you subtract from
sp; to release room you add back.
Memory
+-----------------+ higher addresses
|/////////////////| <- previously used / caller's frame
sp ->+-----------------+ ----.
| int | | 4 doublewords of
+-----------------+ | scratch space this
| int | | function allocated
+-----------------+ |
| int | |
+-----------------+ |
| int | |
sp ->+-----------------+ ----' <- new top after allocation
| |
| (free) | lower addresses
+-----------------+
In the handwritten diagram, the upper
sparrow is before allocation and the lowersparrow is after allocation —spmoved down to open up fourintslots. The brace labeleddmarks the bytes this function carved out for itself.
Allocation and deallocation¶
There are exactly two operations, and they are mirror images:
# Stack allocation: make room (move sp DOWN by 16 bytes)
addi sp, sp, -16
# ... use the 16 bytes via offsets from sp ...
# Stack deallocation: give it back (move sp UP by 16 bytes)
addi sp, sp, 16
The golden rule the instructor stressed: whatever you subtract on entry, you must add back on exit. If allocation and deallocation do not match exactly, sp is left pointing at the wrong place when you ret, and the stack is corrupted for every function above you. Note that sp itself is never stored in memory — it is a register you adjust by arithmetic; you preserve it simply by accounting for every byte you take and give back.
5. The 16-Byte Alignment Rule¶
sp must be a multiple of 16 at every call boundary. That is why even when we only need 8 bytes (one doubleword for ra), we allocate 16:
addi sp, sp, -16 # allocate 16 even if we only need 8
sd ra, (sp) # ra in the bottom 8 bytes; top 8 bytes are padding
Why 16 and not 8?
- The RISC-V ABI requires 16-byte stack alignment so that some instructions (especially vector and floating-point loads/stores) and the C library can assume aligned addresses.
- Following the rule unconditionally is defensive: even if your function would technically work misaligned, a C function you call might crash. Always assume the worst case when you call into library code.
So the practical recipe is: pick the number of bytes you actually need, then round up to the next multiple of 16.
| Bytes needed | Round up to | addi on entry |
|---|---|---|
8 (just ra) |
16 | addi sp, sp, -16 |
16 (ra + one s reg) |
16 | addi sp, sp, -16 |
24 (ra + two s regs) |
32 | addi sp, sp, -32 |
| 40 | 48 | addi sp, sp, -48 |
On a 64-bit machine a pointer (and ra) is 8 bytes — a doubleword (DW). We therefore use sd (store doubleword) and ld (load doubleword), not sw/lw, when saving registers.
6. RISC-V Registers and the Calling Convention¶
There are only 32 registers and every function shares them, so we need a convention that says who is responsible for preserving what. Here is the register map the instructor wrote out:
| x-name | ABI name | Purpose |
|---|---|---|
x0 |
zero |
Always reads as 0; writes are ignored |
x1 |
ra |
Return address |
x2 |
sp |
Stack pointer |
x3 |
gp |
Global pointer — "not going to use" |
x4 |
tp |
Thread pointer — "not going to use" |
x10–x17 |
a0–a7 |
Arguments and return values |
x5–x7, x28–x31 |
t0–t6 |
Temporaries |
x8–x9, x18–x27 |
s0–s11 |
Saved registers |
Caller-saved vs callee-saved¶
This is the heart of the convention. The question each class answers is: "After a call, can I trust this register still holds my value?"
Caller-saved (volatile): a0-a7, t0-t6, ra
-> A call MAY destroy these. If the CALLER needs a value after
the call, the caller must save it first.
Callee-saved (preserved): s0-s11, sp
-> A call WILL leave these intact. If the CALLEE wants to use one,
the callee must save the old value on entry and restore on exit.
flowchart TD
subgraph Caller["Caller's responsibility"]
C1["a0-a7 (arguments / returns)"]
C2["t0-t6 (temporaries)"]
C3["ra (return address)"]
end
subgraph Callee["Callee's responsibility"]
D1["s0-s11 (saved registers)"]
D2["sp (stack pointer)"]
end
Caller -.->|"save BEFORE a call<br/>if needed afterward"| Stack[(Stack)]
Callee -.->|"save ON ENTRY,<br/>restore BEFORE ret"| Stack
Two practical consequences:
- You only spill the registers you actually need across the call. If you do not care about
t0's value after acall, do not save it. The convention tells you what might change, not what you must save. rais caller-saved, which is the formal reason a non-leaf function must save its ownra: from its caller's point of view it is the caller of the next function, so it owns the responsibility to preserveraacross the inner call.
| Class | Registers | Preserved by | When to save |
|---|---|---|---|
| Temporaries | t0–t6 |
Caller | Before a call, only if you need the value after it |
| Args / returns | a0–a7 |
Caller | Before a call, only if you need the value after it |
| Return address | ra |
Caller | In any function that makes a call |
| Saved | s0–s11 |
Callee | On entry, only if your function uses that s register |
| Stack pointer | sp |
Callee | Always balanced via matching add/sub |
7. The Full Function-Call Template¶
Putting Sections 2–6 together gives the canonical prologue / epilogue pattern. This is the exact example from the last handwritten page: foo saves ra and s0, calls bar, then restores and returns.
Stack frame layout¶
We need to save two doublewords — ra and s0 — so we allocate 16 bytes (already a multiple of 16):
after addi sp, sp, -16
+-----------------+ <- sp + 8 (old sp pointed here, "sp+8")
| s0 | callee-saved register we use
+-----------------+ <- sp + 0 (sp, the new top)
| ra | our return address (a doubleword, DW)
+-----------------+
In the handwritten frame,
s0(blue) sits atsp+8andra(red) sits atsp, with the brace marked "DW" reminding us each slot is a doubleword (8 bytes) on the 64-bit machine. The oldspwas atsp+8+8; afteraddi sp, sp, -16the newspis at theraslot.
The code¶
.global foo
foo:
addi sp, sp, -16 # prologue: allocate a 16-byte frame
sd ra, (sp) # save our return address at offset 0
sd s0, 8(sp) # save callee-saved s0 at offset 8
# ... body: we may freely use s0 and may call other functions ...
call bar # ra is now clobbered, but our copy is safe on the stack
# ... more body, using s0 etc. ...
ld s0, 8(sp) # epilogue: restore s0
ld ra, (sp) # restore our return address
addi sp, sp, 16 # deallocate the frame (matches the -16 above)
ret # pc = ra -> back to our caller
bar:
add a0, a0, a1 # bar is a leaf here: does its work...
# ...
ret # pc = ra (the address foo's `call` stored)
Walk the control flow once:
foo: addi sp,sp,-16 callee bar:
sd ra,(sp) add a0,a1,a2
sd s0,8(sp) .
. .
call bar --------------> .
. <------------- ret (pc = ra)
.
ld s0,8(sp)
ld ra,(sp)
addi sp,sp,16
ret (pc = ra)
bar is a leaf, so it does not allocate a frame or save ra — it just computes and returns. foo is the complex function, so it does all the prologue/epilogue work.
Why save ra at the end rather than reload after the call?¶
The instructor noted you could reload ra immediately after each call, but restoring it once at the end of the function is cleaner and avoids overhead: you write the save/restore once around the whole body instead of around every individual call. Save on entry, restore on exit — one prologue, one epilogue.
8. Worked Example: A Function That Adds Four Numbers¶
The instructor built a complex function from a simple one: implement add4(a, b, c, d) using three calls to a leaf add2(x, y). This forces us to confront caller-saved registers, because the arguments c and d must survive across calls that overwrite a0/a1.
The C we are translating¶
int add2(int x, int y) {
return x + y;
}
int add4(int a, int b, int c, int d) {
int t = add2(a, b); // t = a + b
t = add2(t, c); // t = (a+b) + c
t = add2(t, d); // t = (a+b+c) + d
return t;
}
The trap¶
In add4, arguments arrive as a0=a, a1=b, a2=c, a3=d. But add2 returns in a0 and clobbers a0/a1. After the first call add2, the values of c and d (originally in a2, a3) are caller-saved and may be gone. We must protect anything we still need across each call.
The clean approach: copy the still-needed arguments into callee-saved s registers up front (which survive calls automatically), then make the three calls.
.global add4
# a0=a, a1=b, a2=c, a3=d ; result in a0
add4:
addi sp, sp, -32 # frame for ra, s0, s1 -> round 24 up to 32
sd ra, (sp)
sd s0, 8(sp)
sd s1, 16(sp)
mv s0, a2 # s0 = c (survives calls)
mv s1, a3 # s1 = d (survives calls)
call add2 # a0 = a + b (a0=a, a1=b already in place)
mv a1, s0 # set up a1 = c
call add2 # a0 = (a+b) + c
mv a1, s1 # set up a1 = d
call add2 # a0 = (a+b+c) + d <- final result in a0
ld s1, 16(sp)
ld s0, 8(sp)
ld ra, (sp)
addi sp, sp, 32
ret
# a0=x, a1=y ; result in a0 (leaf)
add2:
add a0, a0, a1
ret
Why s registers instead of pushing a2/a3 to the stack repeatedly? Because s0/s1 are callee-saved: once we save them once in the prologue, every call add2 is guaranteed to leave them untouched. We trade three saves/restores for one. This is the same "implement logic first, then add preservation" workflow the instructor recommended.
9. Recommended Workflow: Logic First, Then Preservation¶
Adding calls increases complexity sharply because of caller-saved registers and stack management. The instructor's recommended order of operations:
- Write the correct logic first, ignoring preservation. Use registers freely as if you owned them all. Get the math right.
- Then add stack management: figure out which registers must survive calls, choose
sregisters or stack slots, and write the prologue/epilogue. - Follow alignment rules: total frame size a multiple of 16; matched allocate/deallocate.
A useful review checklist when converting any C function to call-based assembly:
[ ] What are the function's arguments, and which must survive across calls?
[ ] Does this function call anything? If yes, it must save ra.
[ ] Which callee-saved (s*) registers will I use? Save/restore each.
[ ] Is my frame size a multiple of 16?
[ ] Does every `addi sp, sp, -N` have a matching `addi sp, sp, N`?
[ ] Is the return value in a0 before ret?
Bug pattern: forgetting to save ra¶
# BROKEN: non-leaf function that does not save ra
mysum:
call helper # clobbers ra with an address inside mysum
add a0, a0, t0
ret # pc = ra -> jumps back into mysum -> infinite loop!
Bug pattern: mismatched stack adjustment¶
# BROKEN: allocate 16, deallocate 32
foo:
addi sp, sp, -16
...
addi sp, sp, 32 # sp is now 16 bytes too high -> caller's saved data lost
ret
Bug pattern: relying on a caller-saved register after a call¶
# BROKEN: t0 needed after the call but it is caller-saved
li t0, 42
call something # `something` may overwrite t0
add a0, a0, t0 # t0 might no longer be 42!
The fix for the last one is to keep the value in a callee-saved s register, or spill it to the stack around the call.
10. Connecting to Lab03: min and find_max¶
Lab03 (see the spec at /assignments/lab03/) asks for min, quadratic, sum_array, and find_max. Two of these are good candidates for the function-and-stack techniques from today.
min as inline logic vs a helper¶
min(a, b) can be done entirely inline with a branch — no calls, no stack, a pure leaf function:
.global min
# a0 = a, a1 = b ; return smaller in a0
min:
blt a0, a1, done # if a < b, a0 already holds the answer
mv a0, a1 # else answer is b
done:
ret
This is the "ternary-like" / conditional-move pattern the instructor mentioned: pick one of two values based on a comparison. Because min calls nothing, it needs no frame.
find_max computed in a loop (leaf, no calls)¶
You can find the maximum inline, scanning the array with a loop — still a leaf function:
.global find_max
# a0 = int *arr, a1 = int n ; return max in a0
find_max:
lw t0, (a0) # t0 = max = arr[0]
li t1, 1 # i = 1
loop:
bge t1, a1, fmdone # if i >= n, done
slli t2, t1, 2 # t2 = i * 4
add t2, a0, t2 # t2 = &arr[i]
lw t3, (t2) # t3 = arr[i]
bge t0, t3, skip # if max >= arr[i], skip
mv t0, t3 # max = arr[i]
skip:
addi t1, t1, 1 # i++
j loop
fmdone:
mv a0, t0 # return max
ret
find_max built on a max2 helper (non-leaf, needs a frame)¶
Alternatively, factor out max2(a, b) and call it in the loop. Now find_max becomes a complex function: it calls max2, so it must save ra, and it must keep the running max, the index, the base pointer, and the count across each call. The clean way is to park those in callee-saved registers:
.global find_max
# a0 = int *arr, a1 = int n ; return max in a0
find_max:
addi sp, sp, -32 # ra + s0,s1,s2,s3 -> round 40... use 48 if 4 s-regs
sd ra, (sp)
sd s0, 8(sp)
sd s1, 16(sp)
sd s2, 24(sp)
# (if you also save s3, bump frame to 48 and add sd s3, 32(sp))
mv s0, a0 # s0 = base pointer (survives calls)
mv s1, a1 # s1 = n
lw s2, (s0) # s2 = running max = arr[0]
li t0, 1 # i = 1 (t0 is fine; we reset it each iteration use)
fm_loop:
bge t0, s1, fm_done # if i >= n, done
# compute &arr[i] then load arr[i]
slli t1, t0, 2
add t1, s0, t1
lw a1, (t1) # a1 = arr[i] (second arg to max2)
mv a0, s2 # a0 = running max (first arg)
# i is in t0 (caller-saved) and would be lost across the call -> move it
mv s3, t0 # park i in callee-saved s3 across the call
call max2 # a0 = max2(running_max, arr[i])
mv s2, a0 # update running max
addi t0, s3, 1 # i++ (restore i from s3, increment)
j fm_loop
fm_done:
mv a0, s2 # return running max
ld s2, 24(sp)
ld s1, 16(sp)
ld s0, 8(sp)
ld ra, (sp)
addi sp, sp, 32
ret
# a0 = a, a1 = b ; return larger (leaf)
max2:
bge a0, a1, m2done
mv a0, a1
m2done:
ret
Note: if you keep
iin a callee-saved register (s3) from the start, you do not need themv s3, t0dance — and you must then enlarge the frame to 48 bytes to save foursregisters (s0–s3). This is the trade-off between using moresregisters (bigger frame, simpler loop) and juggling temporaries. The loop version in the previous block (all leaf, nomax2) is simpler still; the helper version exists to practice the call/stack mechanics.
The takeaway from the code review: decide up front which values must outlive your calls, and give those values callee-saved homes so the calling convention does the protecting for you.
11. Debugging Call/Return in gdb¶
A gdb walkthrough is the fastest way to see the stack pointer move and confirm ra is preserved. Build with debug symbols and step instruction-by-instruction.
Useful commands for stepping through a call:
(gdb) break find_max # stop at function entry
(gdb) run 1 2 99 3 4 # run with command-line args
(gdb) info registers sp ra # inspect the stack pointer and return address
(gdb) stepi # execute ONE instruction (si)
(gdb) x/4xg $sp # examine 4 doublewords at the top of the stack
(gdb) p $sp # print sp as a value
(gdb) continue # run to next breakpoint
What to verify, matching the gdb demo from class:
- At entry, note
spandra. - After the prologue (
addi sp,sp,-N; sd ra,(sp)), confirmspdecreased byNand that the savedramatches the entryra(x/1xg $sp). - Right after a
call, confirmrachanged (it now points back into your own function) — proving why you needed to save it. - After the epilogue (
ld ra,(sp); addi sp,sp,N), confirmrais restored to the entry value andspis back to the entry value. - At
ret, confirm control returns to the correct caller.
If sp does not return to its entry value, your allocate/deallocate are mismatched. If ra is wrong at ret, you forgot to save or restore it.
flowchart TD
A["Entry: record sp, ra"] --> B["Prologue: sp -= N, save ra/s*"]
B --> C["Body + call: ra now clobbered"]
C --> D["Epilogue: restore ra/s*, sp += N"]
D --> E["ret: pc = ra"]
E --> F{"sp back to entry?<br/>ra back to entry?"}
F -->|yes| G["Correct return"]
F -->|no| H["Stack/ra bug -> fix prologue/epilogue"]
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
call |
Pseudo-instruction (jal ra, label) that sets ra = pc + 4 then jumps to the target |
call bar |
ret |
Pseudo-instruction (jalr x0, ra, 0) that sets pc = ra |
ret |
Return address (ra) |
Register x1 holding where to resume after ret; caller-saved |
sd ra, (sp) |
Stack pointer (sp) |
Register x2 pointing at the current top of the (downward-growing) stack |
addi sp, sp, -16 |
| Leaf function | A function that calls nothing; needs no ra save and usually no frame |
add a0,a0,a1; ret |
| Complex (non-leaf) function | A function that calls others; must save ra and build a frame |
find_max calling max2 |
| Stack frame | The block of stack a function allocates for saved registers/locals | addi sp,sp,-16 then sd |
| 16-byte alignment | ABI rule that sp is a multiple of 16 at call boundaries |
round 24 bytes up to 32 |
| Caller-saved | Registers a call may destroy: a0–a7, t0–t6, ra |
save t0 before a call |
| Callee-saved | Registers a call preserves: s0–s11, sp |
save s0 on entry |
| Doubleword (DW) | 8 bytes; size of a pointer/ra on RV64; use sd/ld |
sd ra, (sp) |
| Prologue / Epilogue | The save/allocate code on entry and restore/deallocate code on exit | addi sp,sp,-16; sd ra,(sp) ... ld ra,(sp); addi sp,sp,16 |
Practice Problems¶
Problem 1: Trace ra and pc¶
foo is at address 0x1000. Inside it, call bar is the instruction at 0x1010, and bar starts at 0x2000. What are ra and pc immediately after the call executes?
Click to reveal solution
`call` does two things: So after the `call`: `pc = 0x2000` (executing `bar`) and `ra = 0x1014` (where `bar` will return to via `ret`).Problem 2: Will this function return correctly?¶
g is called from main. Does g return correctly to main? If not, fix it.
Click to reveal solution
**No.** `g` is a non-leaf function but never saves `ra`. When `g` executes `call helper`, `ra` is overwritten with the address inside `g` (just after the `call`). After `helper` returns and `g` reaches its own `ret`, `pc = ra` jumps back into `g` instead of `main` — an infinite loop. Fix: save and restore `ra` around the call.Problem 3: How big a frame?¶
A function needs to preserve ra, s0, and s1. How many bytes should it allocate, and write the prologue.
Click to reveal solution
Three doublewords = 3 × 8 = **24 bytes** needed. Round **up to the next multiple of 16 → 32 bytes**. The matching epilogue:Problem 4: Caller-saved across a call¶
Assume square is correct. Is t0 guaranteed to still be 5 after the call? How would you make this robust?
Click to reveal solution
**Not guaranteed.** `t0` is caller-saved, so `square` is allowed to overwrite it. If `square` uses `t0` internally, the final `add` uses garbage. Robust fix: keep the value in a callee-saved register (which the callee must preserve), saving/restoring it in this function's frame: Note we also had to save `ra` because this function now makes a call.Problem 5: Spot the stack bug¶
sumpair:
addi sp, sp, -16
sd ra, (sp)
call get_a # a0 = some value
mv s0, a0
call get_b # a0 = some value
add a0, a0, s0
ld ra, (sp)
addi sp, sp, 16
ret
This function uses s0. What is wrong, and what could go wrong for the caller?
Click to reveal solution
`sumpair` writes to `s0` but **never saves or restores it**. `s0` is callee-saved, so `sumpair` is obligated to preserve it for its caller. By clobbering `s0`, it silently corrupts whatever value its caller had stored there. Fix: save and restore `s0` too (and the 16-byte frame already has room at offset 8):Problem 6: Convert C to assembly¶
Translate this C function to ABI-correct RISC-V assembly. inc is an external function; assume it returns its argument plus one.
int inc(int x); // external, returns x + 1
int inc2(int x) {
return inc(inc(x)); // call inc twice
}
Click to reveal solution
`inc2` is non-leaf (it calls `inc`), so it must save `ra`. The two calls chain through `a0`: the result of the first `inc` is already in `a0`, which is exactly the argument the second `inc` wants. No extra value needs to survive across the calls, so no `s` registers are required. The only preservation needed here is `ra`, because the data naturally flows through `a0` from one call to the next.Further Reading¶
- Source notes (handwritten): "/notes/CS315-01 2025-09-10 Lab RISC-V Assembly 4.pdf"
- Course RISC-V reference: /guides/riscv/
- Course key concepts: /guides/key-concepts/
- GDB usage guide: /guides/gdb-usage/
- Lab03 spec: /assignments/lab03/
- RISC-V Calling Convention (psABI)
- Beej's Quick Guide to GDB
Summary¶
-
A function call is a controlled jump that remembers where to return.
callsetsra = pc + 4and jumps to the target;retsetspc = ra. -
There is only one
raregister, so any function that itself makes a call must save its ownrabefore the call and restore it beforeret— otherwise it returns to the wrong place (often an infinite loop). -
The stack grows downward and is managed through
sp. Allocate withaddi sp, sp, -Nand deallocate withaddi sp, sp, N; the two must match exactly. -
Keep
sp16-byte aligned at call boundaries: choose the bytes you need, then round up to a multiple of 16. On RV64 a pointer/rais an 8-byte doubleword, so usesd/ld. -
Caller-saved (
a0–a7,t0–t6,ra) may be destroyed by a call; callee-saved (s0–s11,sp) are preserved across a call. Save only the registers you actually need to survive. -
The prologue/epilogue template — allocate frame, save
raand usedsregisters, do the body, restore, deallocate,ret— is the backbone of every complex function. -
Write the logic first, then add preservation, and verify with
gdb: confirmspandrareturn to their entry values, proving the stack is balanced and the return address is intact.