Skip to content

Lab: RISC-V Assembly Part 4 — Functions and the Stack

Overview

This hands-on lab session moves from writing single, self-contained ("leaf") functions to writing complex functions — functions that call other functions. To do this safely on RISC-V we have to understand three intertwined mechanisms: how the call and ret instructions move control between functions using the program counter (pc) and return address (ra); how the stack gives each function scratch space that survives across calls; and the calling convention that decides which registers a caller must protect and which a callee must protect. We build up a complete prologue/epilogue template, walk through it in gdb, and connect everything back to the min and find_max problems in Lab03.

Learning Objectives

  • Explain what the call and ret instructions do to pc and ra at the hardware level
  • Describe why nested function calls require saving and restoring the return address (ra)
  • Allocate and deallocate stack space correctly using addi sp, sp, -N / addi sp, sp, N
  • Apply the 16-byte stack-alignment rule and explain why it matters
  • Distinguish caller-saved from callee-saved registers and decide which to spill
  • Write a correct prologue and epilogue for a non-leaf function
  • Use gdb to single-step through a call/return and verify the stack pointer is balanced
  • Convert a working C function with helper calls into correct, ABI-compliant RISC-V assembly

Prerequisites

  • RISC-V instruction basics: add, addi, li, mul, branches, j, labels (Lab02 / Assembly Parts 1–3)
  • Memory instructions: ld/sd (doubleword), lw/sw (word), and base+offset addressing
  • The fetch/decode/execute cycle and the role of the pc
  • C functions, arguments, return values, and the notion of a call stack
  • Comfort building and running programs with make and stepping with gdb

1. From Leaf Functions to Complex Functions

In the earlier assembly labs every function was a leaf function: it did some arithmetic and returned, without calling anything else. A leaf function is easy — it never touches ra after entry and usually needs no stack frame at all:

# Leaf function: a0 = a0 + a1, then return
.global add_two
add_two:
    add a0, a0, a1
    ret

A complex function (the instructor's term for a non-leaf function) is one that calls at least one other function. The moment a function makes a call, two new problems appear:

  1. The call instruction overwrites ra. If we are inside a function that was itself called, our own return address gets clobbered.
  2. Any value we are keeping in a caller-saved register can be destroyed by the function we call.

The rest of this lab is about solving those two problems systematically. The diagram below is the mental model the instructor drew: a caller foo transferring control to a callee bar and getting control back.

sequenceDiagram
    participant foo as foo (caller)
    participant bar as bar (callee)
    Note over foo: pc advances: add, addi (pc = pc + 4)
    foo->>bar: call bar  (ra = pc+4, pc = addr of bar)
    Note over bar: mul, add, ...
    bar-->>foo: ret  (pc = ra)
    Note over foo: continues at instruction after the call

The key insight: a function call is just a controlled jump that remembers where to come back to. That "where to come back to" lives in ra.


2. The Program Counter and Return Address

Two registers drive all control flow:

Register ABI name x-name Role
Program counter pc Address of the instruction currently being executed; normally advances by 4 each instruction
Return address ra x1 Where control should resume after the current function returns

Normal straight-line execution simply does pc = pc + 4 after each 4-byte instruction. A branch or jump sets pc to some other address. A function call and return are built from these primitives.

What call does

call bar is a pseudo-instruction for jal ra, bar (jump and link). It performs two updates in one instruction:

call:
    1) ra = pc + 4      # remember the instruction AFTER the call
    2) pc = addr of bar # jump to the target function (its label)

Step 1 is the "link" part: it stashes the address of the next instruction into ra so the callee knows where to return. Step 2 is the "jump" part: control transfers to the first instruction of bar.

What ret does

ret is a pseudo-instruction for jalr x0, ra, 0 (jump to the address in ra, discarding any link). It performs a single update:

ret:
    1) pc = ra          # jump back to wherever ra points

So ret just copies ra into pc. That is the entire return mechanism. If ra holds the right address, you return to the right place. If something corrupted ra, you return to the wrong place — often an infinite loop or a crash.

caller foo:                         callee bar:
  pc -> add                           pc -> mul
        addi   (pc = pc + 4)                add
         .                                   .
         .                                   .
  pc -> call bar  --------------------->     .
  ra = pc + 4                                .
         .          <------------------- ret  (pc = ra)
         .
        ret  (pc = ra of foo's caller)

Mapping to the handwritten diagram: the long arrow from call bar over to mul is step 2 of call (pc jumps to bar). The arrow coming back from bar's ret to just below call bar is ret setting pc = ra.


3. Why Nested Calls Break: The ra Clobber Problem

Here is the trap. There is exactly one ra register. Suppose main calls foo, and foo calls bar:

flowchart LR
    M[main] -->|call foo<br/>ra = return into main| F[foo]
    F -->|call bar<br/>ra = return into foo| B[bar]
    B -.ret -> pc = ra.-> F
    F -.ret -> pc = ra.-> M

Trace what happens to ra if foo does not protect it:

main:  call foo        # ra = (address in main, just after this call)
foo:   ...
       call bar        # ra = (address in foo, just after this call)  <-- OVERWRITES main's ra!
       ...
       ret             # pc = ra -> returns correctly into foo's body... wait, no:

After foo executes call bar, the original ra (the address back into main) is gone — it was overwritten with the address inside foo. When bar later does ret, it correctly returns into foo. But when foo then does ret, ra still points back into foo (or wherever the last call left it), not into main. The program returns to the wrong place. A common symptom is an infinite loop: foo returns into itself.

The fix: because ra is a caller-saved register (Section 6), any function that makes a call must save its own ra before that call and restore it before ret. The only safe place to keep it is the stack, because the stack gives every function its own private storage.

This is exactly why leaf functions need no stack frame (they never call anything, so their ra is never clobbered) while complex functions almost always do.


4. The Stack: Layout and the Stack Pointer

The stack is a region of memory used for temporary, per-function storage: saved registers, the return address, and local variables that do not fit in registers. It is managed entirely through the stack pointer, sp (x2).

Direction and the stack pointer

  • sp always points to the current top of the stack — the lowest in-use address.
  • The stack grows downward (toward smaller addresses). To make room you subtract from sp; to release room you add back.
            Memory
        +-----------------+   higher addresses
        |/////////////////|   <- previously used / caller's frame
   sp ->+-----------------+ ----.
        |      int        |     |  4 doublewords of
        +-----------------+     |  scratch space this
        |      int        |     |  function allocated
        +-----------------+     |
        |      int        |     |
        +-----------------+     |
        |      int        |     |
   sp ->+-----------------+ ----'   <- new top after allocation
        |                 |
        |   (free)        |   lower addresses
        +-----------------+

In the handwritten diagram, the upper sp arrow is before allocation and the lower sp arrow is after allocation — sp moved down to open up four int slots. The brace labeled d marks the bytes this function carved out for itself.

Allocation and deallocation

There are exactly two operations, and they are mirror images:

# Stack allocation: make room (move sp DOWN by 16 bytes)
addi sp, sp, -16

# ... use the 16 bytes via offsets from sp ...

# Stack deallocation: give it back (move sp UP by 16 bytes)
addi sp, sp, 16

The golden rule the instructor stressed: whatever you subtract on entry, you must add back on exit. If allocation and deallocation do not match exactly, sp is left pointing at the wrong place when you ret, and the stack is corrupted for every function above you. Note that sp itself is never stored in memory — it is a register you adjust by arithmetic; you preserve it simply by accounting for every byte you take and give back.


5. The 16-Byte Alignment Rule

sp must be a multiple of 16 at every call boundary. That is why even when we only need 8 bytes (one doubleword for ra), we allocate 16:

addi sp, sp, -16     # allocate 16 even if we only need 8
sd   ra, (sp)        # ra in the bottom 8 bytes; top 8 bytes are padding

Why 16 and not 8?

  • The RISC-V ABI requires 16-byte stack alignment so that some instructions (especially vector and floating-point loads/stores) and the C library can assume aligned addresses.
  • Following the rule unconditionally is defensive: even if your function would technically work misaligned, a C function you call might crash. Always assume the worst case when you call into library code.

So the practical recipe is: pick the number of bytes you actually need, then round up to the next multiple of 16.

Bytes needed Round up to addi on entry
8 (just ra) 16 addi sp, sp, -16
16 (ra + one s reg) 16 addi sp, sp, -16
24 (ra + two s regs) 32 addi sp, sp, -32
40 48 addi sp, sp, -48

On a 64-bit machine a pointer (and ra) is 8 bytes — a doubleword (DW). We therefore use sd (store doubleword) and ld (load doubleword), not sw/lw, when saving registers.


6. RISC-V Registers and the Calling Convention

There are only 32 registers and every function shares them, so we need a convention that says who is responsible for preserving what. Here is the register map the instructor wrote out:

x-name ABI name Purpose
x0 zero Always reads as 0; writes are ignored
x1 ra Return address
x2 sp Stack pointer
x3 gp Global pointer — "not going to use"
x4 tp Thread pointer — "not going to use"
x10–x17 a0–a7 Arguments and return values
x5–x7, x28–x31 t0–t6 Temporaries
x8–x9, x18–x27 s0–s11 Saved registers

Caller-saved vs callee-saved

This is the heart of the convention. The question each class answers is: "After a call, can I trust this register still holds my value?"

Caller-saved (volatile): a0-a7, t0-t6, ra
    -> A call MAY destroy these. If the CALLER needs a value after
       the call, the caller must save it first.

Callee-saved (preserved): s0-s11, sp
    -> A call WILL leave these intact. If the CALLEE wants to use one,
       the callee must save the old value on entry and restore on exit.
flowchart TD
    subgraph Caller["Caller's responsibility"]
        C1["a0-a7 (arguments / returns)"]
        C2["t0-t6 (temporaries)"]
        C3["ra (return address)"]
    end
    subgraph Callee["Callee's responsibility"]
        D1["s0-s11 (saved registers)"]
        D2["sp (stack pointer)"]
    end
    Caller -.->|"save BEFORE a call<br/>if needed afterward"| Stack[(Stack)]
    Callee -.->|"save ON ENTRY,<br/>restore BEFORE ret"| Stack

Two practical consequences:

  • You only spill the registers you actually need across the call. If you do not care about t0's value after a call, do not save it. The convention tells you what might change, not what you must save.
  • ra is caller-saved, which is the formal reason a non-leaf function must save its own ra: from its caller's point of view it is the caller of the next function, so it owns the responsibility to preserve ra across the inner call.
Class Registers Preserved by When to save
Temporaries t0–t6 Caller Before a call, only if you need the value after it
Args / returns a0–a7 Caller Before a call, only if you need the value after it
Return address ra Caller In any function that makes a call
Saved s0–s11 Callee On entry, only if your function uses that s register
Stack pointer sp Callee Always balanced via matching add/sub

7. The Full Function-Call Template

Putting Sections 2–6 together gives the canonical prologue / epilogue pattern. This is the exact example from the last handwritten page: foo saves ra and s0, calls bar, then restores and returns.

Stack frame layout

We need to save two doublewords — ra and s0 — so we allocate 16 bytes (already a multiple of 16):

            after  addi sp, sp, -16
        +-----------------+  <- sp + 8   (old sp pointed here, "sp+8")
        |       s0        |  callee-saved register we use
        +-----------------+  <- sp + 0   (sp, the new top)
        |       ra        |  our return address (a doubleword, DW)
        +-----------------+

In the handwritten frame, s0 (blue) sits at sp+8 and ra (red) sits at sp, with the brace marked "DW" reminding us each slot is a doubleword (8 bytes) on the 64-bit machine. The old sp was at sp+8+8; after addi sp, sp, -16 the new sp is at the ra slot.

The code

.global foo
foo:
    addi sp, sp, -16      # prologue: allocate a 16-byte frame
    sd   ra, (sp)         # save our return address at offset 0
    sd   s0, 8(sp)        # save callee-saved s0 at offset 8

    # ... body: we may freely use s0 and may call other functions ...
    call bar              # ra is now clobbered, but our copy is safe on the stack

    # ... more body, using s0 etc. ...

    ld   s0, 8(sp)        # epilogue: restore s0
    ld   ra, (sp)         # restore our return address
    addi sp, sp, 16       # deallocate the frame (matches the -16 above)
    ret                   # pc = ra -> back to our caller

bar:
    add a0, a0, a1        # bar is a leaf here: does its work...
    # ...
    ret                   # pc = ra (the address foo's `call` stored)

Walk the control flow once:

foo: addi sp,sp,-16          callee bar:
     sd ra,(sp)                add a0,a1,a2
     sd s0,8(sp)               .
      .                        .
     call bar  -------------->  .
      .         <------------- ret   (pc = ra)
      .
     ld s0,8(sp)
     ld ra,(sp)
     addi sp,sp,16
     ret  (pc = ra)

bar is a leaf, so it does not allocate a frame or save ra — it just computes and returns. foo is the complex function, so it does all the prologue/epilogue work.

Why save ra at the end rather than reload after the call?

The instructor noted you could reload ra immediately after each call, but restoring it once at the end of the function is cleaner and avoids overhead: you write the save/restore once around the whole body instead of around every individual call. Save on entry, restore on exit — one prologue, one epilogue.


8. Worked Example: A Function That Adds Four Numbers

The instructor built a complex function from a simple one: implement add4(a, b, c, d) using three calls to a leaf add2(x, y). This forces us to confront caller-saved registers, because the arguments c and d must survive across calls that overwrite a0/a1.

The C we are translating

int add2(int x, int y) {
    return x + y;
}

int add4(int a, int b, int c, int d) {
    int t = add2(a, b);   // t = a + b
    t = add2(t, c);       // t = (a+b) + c
    t = add2(t, d);       // t = (a+b+c) + d
    return t;
}

The trap

In add4, arguments arrive as a0=a, a1=b, a2=c, a3=d. But add2 returns in a0 and clobbers a0/a1. After the first call add2, the values of c and d (originally in a2, a3) are caller-saved and may be gone. We must protect anything we still need across each call.

The clean approach: copy the still-needed arguments into callee-saved s registers up front (which survive calls automatically), then make the three calls.

.global add4
# a0=a, a1=b, a2=c, a3=d  ;  result in a0
add4:
    addi sp, sp, -32        # frame for ra, s0, s1 -> round 24 up to 32
    sd   ra, (sp)
    sd   s0, 8(sp)
    sd   s1, 16(sp)

    mv   s0, a2             # s0 = c  (survives calls)
    mv   s1, a3             # s1 = d  (survives calls)

    call add2               # a0 = a + b      (a0=a, a1=b already in place)

    mv   a1, s0             # set up a1 = c
    call add2               # a0 = (a+b) + c

    mv   a1, s1             # set up a1 = d
    call add2               # a0 = (a+b+c) + d   <- final result in a0

    ld   s1, 16(sp)
    ld   s0, 8(sp)
    ld   ra, (sp)
    addi sp, sp, 32
    ret

# a0=x, a1=y ; result in a0  (leaf)
add2:
    add a0, a0, a1
    ret

Why s registers instead of pushing a2/a3 to the stack repeatedly? Because s0/s1 are callee-saved: once we save them once in the prologue, every call add2 is guaranteed to leave them untouched. We trade three saves/restores for one. This is the same "implement logic first, then add preservation" workflow the instructor recommended.


Adding calls increases complexity sharply because of caller-saved registers and stack management. The instructor's recommended order of operations:

  1. Write the correct logic first, ignoring preservation. Use registers freely as if you owned them all. Get the math right.
  2. Then add stack management: figure out which registers must survive calls, choose s registers or stack slots, and write the prologue/epilogue.
  3. Follow alignment rules: total frame size a multiple of 16; matched allocate/deallocate.

A useful review checklist when converting any C function to call-based assembly:

[ ] What are the function's arguments, and which must survive across calls?
[ ] Does this function call anything? If yes, it must save ra.
[ ] Which callee-saved (s*) registers will I use? Save/restore each.
[ ] Is my frame size a multiple of 16?
[ ] Does every `addi sp, sp, -N` have a matching `addi sp, sp, N`?
[ ] Is the return value in a0 before ret?

Bug pattern: forgetting to save ra

# BROKEN: non-leaf function that does not save ra
mysum:
    call helper      # clobbers ra with an address inside mysum
    add a0, a0, t0
    ret              # pc = ra -> jumps back into mysum -> infinite loop!

Bug pattern: mismatched stack adjustment

# BROKEN: allocate 16, deallocate 32
foo:
    addi sp, sp, -16
    ...
    addi sp, sp, 32   # sp is now 16 bytes too high -> caller's saved data lost
    ret

Bug pattern: relying on a caller-saved register after a call

# BROKEN: t0 needed after the call but it is caller-saved
    li   t0, 42
    call something    # `something` may overwrite t0
    add  a0, a0, t0   # t0 might no longer be 42!

The fix for the last one is to keep the value in a callee-saved s register, or spill it to the stack around the call.


10. Connecting to Lab03: min and find_max

Lab03 (see the spec at /assignments/lab03/) asks for min, quadratic, sum_array, and find_max. Two of these are good candidates for the function-and-stack techniques from today.

min as inline logic vs a helper

min(a, b) can be done entirely inline with a branch — no calls, no stack, a pure leaf function:

.global min
# a0 = a, a1 = b ; return smaller in a0
min:
    blt a0, a1, done   # if a < b, a0 already holds the answer
    mv  a0, a1         # else answer is b
done:
    ret

This is the "ternary-like" / conditional-move pattern the instructor mentioned: pick one of two values based on a comparison. Because min calls nothing, it needs no frame.

find_max computed in a loop (leaf, no calls)

You can find the maximum inline, scanning the array with a loop — still a leaf function:

.global find_max
# a0 = int *arr, a1 = int n ; return max in a0
find_max:
    lw   t0, (a0)        # t0 = max = arr[0]
    li   t1, 1           # i = 1
loop:
    bge  t1, a1, fmdone  # if i >= n, done
    slli t2, t1, 2       # t2 = i * 4
    add  t2, a0, t2      # t2 = &arr[i]
    lw   t3, (t2)        # t3 = arr[i]
    bge  t0, t3, skip    # if max >= arr[i], skip
    mv   t0, t3          # max = arr[i]
skip:
    addi t1, t1, 1       # i++
    j    loop
fmdone:
    mv   a0, t0          # return max
    ret

find_max built on a max2 helper (non-leaf, needs a frame)

Alternatively, factor out max2(a, b) and call it in the loop. Now find_max becomes a complex function: it calls max2, so it must save ra, and it must keep the running max, the index, the base pointer, and the count across each call. The clean way is to park those in callee-saved registers:

.global find_max
# a0 = int *arr, a1 = int n ; return max in a0
find_max:
    addi sp, sp, -32       # ra + s0,s1,s2,s3 -> round 40... use 48 if 4 s-regs
    sd   ra, (sp)
    sd   s0, 8(sp)
    sd   s1, 16(sp)
    sd   s2, 24(sp)
    # (if you also save s3, bump frame to 48 and add sd s3, 32(sp))

    mv   s0, a0            # s0 = base pointer (survives calls)
    mv   s1, a1            # s1 = n
    lw   s2, (s0)          # s2 = running max = arr[0]
    li   t0, 1             # i = 1 (t0 is fine; we reset it each iteration use)

fm_loop:
    bge  t0, s1, fm_done   # if i >= n, done
    # compute &arr[i] then load arr[i]
    slli t1, t0, 2
    add  t1, s0, t1
    lw   a1, (t1)          # a1 = arr[i]   (second arg to max2)
    mv   a0, s2            # a0 = running max (first arg)
    # i is in t0 (caller-saved) and would be lost across the call -> move it
    mv   s3, t0            # park i in callee-saved s3 across the call
    call max2              # a0 = max2(running_max, arr[i])
    mv   s2, a0            # update running max
    addi t0, s3, 1         # i++ (restore i from s3, increment)
    j    fm_loop

fm_done:
    mv   a0, s2            # return running max
    ld   s2, 24(sp)
    ld   s1, 16(sp)
    ld   s0, 8(sp)
    ld   ra, (sp)
    addi sp, sp, 32
    ret

# a0 = a, a1 = b ; return larger (leaf)
max2:
    bge a0, a1, m2done
    mv  a0, a1
m2done:
    ret

Note: if you keep i in a callee-saved register (s3) from the start, you do not need the mv s3, t0 dance — and you must then enlarge the frame to 48 bytes to save four s registers (s0–s3). This is the trade-off between using more s registers (bigger frame, simpler loop) and juggling temporaries. The loop version in the previous block (all leaf, no max2) is simpler still; the helper version exists to practice the call/stack mechanics.

The takeaway from the code review: decide up front which values must outlive your calls, and give those values callee-saved homes so the calling convention does the protecting for you.


11. Debugging Call/Return in gdb

A gdb walkthrough is the fastest way to see the stack pointer move and confirm ra is preserved. Build with debug symbols and step instruction-by-instruction.

# assemble + link with debug info
gcc -g -static -o find_max main.c find_max.s
gdb ./find_max

Useful commands for stepping through a call:

(gdb) break find_max        # stop at function entry
(gdb) run 1 2 99 3 4        # run with command-line args
(gdb) info registers sp ra  # inspect the stack pointer and return address
(gdb) stepi                 # execute ONE instruction (si)
(gdb) x/4xg $sp             # examine 4 doublewords at the top of the stack
(gdb) p $sp                 # print sp as a value
(gdb) continue              # run to next breakpoint

What to verify, matching the gdb demo from class:

  1. At entry, note sp and ra.
  2. After the prologue (addi sp,sp,-N; sd ra,(sp)), confirm sp decreased by N and that the saved ra matches the entry ra (x/1xg $sp).
  3. Right after a call, confirm ra changed (it now points back into your own function) — proving why you needed to save it.
  4. After the epilogue (ld ra,(sp); addi sp,sp,N), confirm ra is restored to the entry value and sp is back to the entry value.
  5. At ret, confirm control returns to the correct caller.

If sp does not return to its entry value, your allocate/deallocate are mismatched. If ra is wrong at ret, you forgot to save or restore it.

flowchart TD
    A["Entry: record sp, ra"] --> B["Prologue: sp -= N, save ra/s*"]
    B --> C["Body + call: ra now clobbered"]
    C --> D["Epilogue: restore ra/s*, sp += N"]
    D --> E["ret: pc = ra"]
    E --> F{"sp back to entry?<br/>ra back to entry?"}
    F -->|yes| G["Correct return"]
    F -->|no| H["Stack/ra bug -> fix prologue/epilogue"]

Key Concepts

Concept Definition Example
call Pseudo-instruction (jal ra, label) that sets ra = pc + 4 then jumps to the target call bar
ret Pseudo-instruction (jalr x0, ra, 0) that sets pc = ra ret
Return address (ra) Register x1 holding where to resume after ret; caller-saved sd ra, (sp)
Stack pointer (sp) Register x2 pointing at the current top of the (downward-growing) stack addi sp, sp, -16
Leaf function A function that calls nothing; needs no ra save and usually no frame add a0,a0,a1; ret
Complex (non-leaf) function A function that calls others; must save ra and build a frame find_max calling max2
Stack frame The block of stack a function allocates for saved registers/locals addi sp,sp,-16 then sd
16-byte alignment ABI rule that sp is a multiple of 16 at call boundaries round 24 bytes up to 32
Caller-saved Registers a call may destroy: a0–a7, t0–t6, ra save t0 before a call
Callee-saved Registers a call preserves: s0–s11, sp save s0 on entry
Doubleword (DW) 8 bytes; size of a pointer/ra on RV64; use sd/ld sd ra, (sp)
Prologue / Epilogue The save/allocate code on entry and restore/deallocate code on exit addi sp,sp,-16; sd ra,(sp) ... ld ra,(sp); addi sp,sp,16

Practice Problems

Problem 1: Trace ra and pc

foo is at address 0x1000. Inside it, call bar is the instruction at 0x1010, and bar starts at 0x2000. What are ra and pc immediately after the call executes?

Click to reveal solution `call` does two things:
ra = pc + 4 = 0x1010 + 4 = 0x1014   # address of the instruction after the call
pc = address of bar = 0x2000        # jump to the target
So after the `call`: `pc = 0x2000` (executing `bar`) and `ra = 0x1014` (where `bar` will return to via `ret`).

Problem 2: Will this function return correctly?

helper:
    add a0, a0, a1
    ret

g:
    call helper
    addi a0, a0, 1
    ret

g is called from main. Does g return correctly to main? If not, fix it.

Click to reveal solution **No.** `g` is a non-leaf function but never saves `ra`. When `g` executes `call helper`, `ra` is overwritten with the address inside `g` (just after the `call`). After `helper` returns and `g` reaches its own `ret`, `pc = ra` jumps back into `g` instead of `main` — an infinite loop. Fix: save and restore `ra` around the call.
g:
    addi sp, sp, -16
    sd   ra, (sp)
    call helper
    addi a0, a0, 1
    ld   ra, (sp)
    addi sp, sp, 16
    ret

Problem 3: How big a frame?

A function needs to preserve ra, s0, and s1. How many bytes should it allocate, and write the prologue.

Click to reveal solution Three doublewords = 3 × 8 = **24 bytes** needed. Round **up to the next multiple of 16 → 32 bytes**.
    addi sp, sp, -32
    sd   ra, (sp)
    sd   s0, 8(sp)
    sd   s1, 16(sp)
    # offset 24 is padding for alignment
The matching epilogue:
    ld   s1, 16(sp)
    ld   s0, 8(sp)
    ld   ra, (sp)
    addi sp, sp, 32
    ret

Problem 4: Caller-saved across a call

    li   t0, 5
    li   a0, 10
    call square        # returns a0 = a0 * a0
    add  a0, a0, t0    # want result + 5
    ret

Assume square is correct. Is t0 guaranteed to still be 5 after the call? How would you make this robust?

Click to reveal solution **Not guaranteed.** `t0` is caller-saved, so `square` is allowed to overwrite it. If `square` uses `t0` internally, the final `add` uses garbage. Robust fix: keep the value in a callee-saved register (which the callee must preserve), saving/restoring it in this function's frame:
    addi sp, sp, -16
    sd   ra, (sp)
    sd   s0, 8(sp)
    li   s0, 5         # s0 survives the call
    li   a0, 10
    call square
    add  a0, a0, s0
    ld   s0, 8(sp)
    ld   ra, (sp)
    addi sp, sp, 16
    ret
Note we also had to save `ra` because this function now makes a call.

Problem 5: Spot the stack bug

sumpair:
    addi sp, sp, -16
    sd   ra, (sp)
    call get_a         # a0 = some value
    mv   s0, a0
    call get_b         # a0 = some value
    add  a0, a0, s0
    ld   ra, (sp)
    addi sp, sp, 16
    ret

This function uses s0. What is wrong, and what could go wrong for the caller?

Click to reveal solution `sumpair` writes to `s0` but **never saves or restores it**. `s0` is callee-saved, so `sumpair` is obligated to preserve it for its caller. By clobbering `s0`, it silently corrupts whatever value its caller had stored there. Fix: save and restore `s0` too (and the 16-byte frame already has room at offset 8):
sumpair:
    addi sp, sp, -16
    sd   ra, (sp)
    sd   s0, 8(sp)     # save callee-saved s0
    call get_a
    mv   s0, a0
    call get_b
    add  a0, a0, s0
    ld   s0, 8(sp)     # restore it
    ld   ra, (sp)
    addi sp, sp, 16
    ret

Problem 6: Convert C to assembly

Translate this C function to ABI-correct RISC-V assembly. inc is an external function; assume it returns its argument plus one.

int inc(int x);          // external, returns x + 1

int inc2(int x) {
    return inc(inc(x));  // call inc twice
}
Click to reveal solution `inc2` is non-leaf (it calls `inc`), so it must save `ra`. The two calls chain through `a0`: the result of the first `inc` is already in `a0`, which is exactly the argument the second `inc` wants. No extra value needs to survive across the calls, so no `s` registers are required.
.global inc2
# a0 = x ; return inc(inc(x)) in a0
inc2:
    addi sp, sp, -16
    sd   ra, (sp)
    call inc          # a0 = inc(x)
    call inc          # a0 = inc(inc(x))  (a0 carries the first result in)
    ld   ra, (sp)
    addi sp, sp, 16
    ret
The only preservation needed here is `ra`, because the data naturally flows through `a0` from one call to the next.

Further Reading


Summary

  1. A function call is a controlled jump that remembers where to return. call sets ra = pc + 4 and jumps to the target; ret sets pc = ra.

  2. There is only one ra register, so any function that itself makes a call must save its own ra before the call and restore it before ret — otherwise it returns to the wrong place (often an infinite loop).

  3. The stack grows downward and is managed through sp. Allocate with addi sp, sp, -N and deallocate with addi sp, sp, N; the two must match exactly.

  4. Keep sp 16-byte aligned at call boundaries: choose the bytes you need, then round up to a multiple of 16. On RV64 a pointer/ra is an 8-byte doubleword, so use sd/ld.

  5. Caller-saved (a0–a7, t0–t6, ra) may be destroyed by a call; callee-saved (s0–s11, sp) are preserved across a call. Save only the registers you actually need to survive.

  6. The prologue/epilogue template — allocate frame, save ra and used s registers, do the body, restore, deallocate, ret — is the backbone of every complex function.

  7. Write the logic first, then add preservation, and verify with gdb: confirm sp and ra return to their entry values, proving the stack is balanced and the return address is intact.