RISC-V Assembly Part 5: Functions and Recursion¶

Overview¶

This lecture brings together everything needed to write multi-function RISC-V assembly programs. We formalize the RISC-V calling convention, dividing the register file into caller-saved and callee-saved groups, and use that convention to build correct function prologues and epilogues. We then study non-leaf functions (functions that call other functions) using two complementary strategies, and finally apply the same rules to recursion, walking through a recursive factorial and visualizing how each call gets its own stack frame. These skills are exactly what Project 2 (findmaxfc, sort, fibrec) requires.

Learning Objectives¶

Explain how call and ret use the program counter and the return-address register ra
Classify every RISC-V register as caller-saved or callee-saved and explain who is responsible for preserving it
Write correct function prologues and epilogues that allocate, use, and free stack space in 16-byte-aligned frames
Choose the right load/store instruction (sd/ld, sw/lw, sb/lb) for the data size being saved
Implement non-leaf functions using both the caller-saved and the callee-saved approach
Trace recursive function execution through nested stack frames and identify the base case and recursive step
Translate compound conditionals such as a <= x && x <= b into RISC-V branches using inverse comparisons
Use GDB to single-step through recursive code and inspect the stack

Prerequisites¶

RISC-V registers, instructions, labels, and directives (Lab 02)
Control flow in assembly: beq, bne, blt, bge, j, and leaf functions
Memory model: byte-addressable memory, the stack grows downward, 64-bit doublewords are 8 bytes
C concepts: the call stack, local variables, recursion
Comfort with the autograder, make, and GDB from earlier labs

1. The Call/Return Mechanism¶

Why Functions Need Hardware Support¶

A function call must do two things:

Transfer control to the function's first instruction.
Remember where execution should resume when the function returns.

RISC-V handles the first part with a jump and the second part with a dedicated register, ra (the return address, also known as x1). There is no magic: a function "returning" is just the processor jumping back to the address stored in ra.

`call` and `ret`¶

Both are pseudo-instructions that the assembler expands into real instructions:

call target      # expands to: jal ra, target
                 #   1. ra = PC + 4   (address of the instruction AFTER call)
                 #   2. PC = target   (jump into the function)

ret              # expands to: jalr zero, ra, 0
                 #   1. PC = ra        (jump back to the caller)

jal means "jump and link": it links (saves) the return address in ra and then jumps. ret uses jalr (jump and link register) but discards the link by writing it to zero, so it simply sets PC = ra.

How `ra` Links Caller and Callee¶

Consider a caller at address 0x1000 calling bar:

Address    Instruction
0x1000     call bar          # ra <- 0x1004, then PC <- bar
0x1004     addi a0, a0, 1    # execution resumes HERE when bar returns
...
bar:
0x2000     add a0, a0, a1    # bar's body
0x2004     ret               # PC <- ra = 0x1004

This is exactly the picture from the handwritten notes: the call bar line is where PC points, and the moment the call executes, ra = PC + 4, so control comes back to the instruction right after the call.

flowchart LR
    A["foo: ... call bar"] -->|"ra = PC+4, jump"| B["bar: add a0,a0,a1"]
    B -->|ret sets PC = ra| C["foo: instruction after call"]
    style B fill:#f9f,stroke:#333,stroke-width:2px

The Danger: `ra` Is Just a Register¶

call overwrites ra every time. If foo calls bar, and foo was itself called (so foo's own return address is sitting in ra), then call bar destroys foo's return address. When foo later runs ret, it jumps back into itself instead of to its caller. This is the classic infinite loop bug the lecture warned about. The fix is to save ra on the stack before making any call, which leads directly to the prologue/epilogue pattern.

2. Caller-Saved vs Callee-Saved Registers¶

RISC-V has only 32 registers, shared by every function in the program. Because there is one physical register file, functions must agree on a convention for who preserves what across a call. The convention splits registers into two groups.

From the handwritten notes:

caller-saved regs:  a0-a7,  t0-t6,  ra
callee-saved regs:  s0-s11, sp

Caller-Saved (Temporary) Registers¶

The called function is free to clobber these. If the caller still needs a value after the call, the caller must save it (typically to its stack frame) before the call and restore it after.

a0–a7: argument and return-value registers
t0–t6: temporaries
ra: the return address (every call overwrites it)

Callee-Saved (Saved) Registers¶

The called function must leave these exactly as it found them. If a function wants to use an s register, it saves the old value in its prologue and restores it in its epilogue.

s0–s11: saved registers
sp: the stack pointer (preserved by allocating on entry and deallocating by the same amount on exit)

Summary Table¶

Class	Registers	Preserved by	Survives a call?	Notes
Arguments / return	`a0`–`a7`	Caller	No	`a0`/`a1` carry return values
Temporaries	`t0`–`t6`	Caller	No	Free to use in leaf functions
Return address	`ra`	Caller	No	Overwritten by every `call`
Saved	`s0`–`s11`	Callee	Yes	Save/restore if you touch them
Stack pointer	`sp`	Callee	Yes	Must stay 16-byte aligned
Fixed	`zero`, `gp`, `tp`	N/A	—	Do not clobber `gp`/`tp`

A Subtle Rule: Preserve What Your Callees Might Touch¶

The lecture stressed a point students often miss: you must preserve a register if a function you call might modify it, even if your own code does not. For example, if you keep a value you need in a2 and then call any function, that function is allowed to overwrite a2, so you must save a2 first. Conversely, you do not need to save caller-saved registers you will not read again after the call. Save only what you actually need to survive.

Argument Passing and Return Values¶

Argument	Register
1st	`a0`
2nd	`a1`
3rd	`a2`
...	...
8th	`a7`
9th and beyond	on the stack

The return value comes back in a0 (and a1 for a second 64-bit half if needed).

3. The Stack and the Prologue/Epilogue Pattern¶

Stack Direction and Alignment¶

The stack grows downward toward lower addresses. To allocate N bytes you subtract N from sp; to free them you add N back.

addi sp, sp, -16    # allocate 16 bytes (sp moves DOWN)
...
addi sp, sp, 16     # free 16 bytes (sp moves back UP)

Two hard rules:

sp must always be a multiple of 16. Even if you only need 8 bytes for ra, you allocate 16. Round the bytes you need up to the next multiple of 16.
Never write below sp. There is no red zone in this convention; allocate the space first, then store into it.

The Pattern¶

Every non-leaf function follows the same skeleton, seen in the foo example from the handwritten notes:

Prologue:  allocate stack space, save ra (and any s-registers you will use)
Body:      do work, including calls to other functions
Epilogue:  restore the saved registers, deallocate the stack, ret

The notes' foo looked like this on entry and exit:

.global foo
foo:
    addi sp, sp, -16    # prologue: allocate 16 bytes
    sd   ra, (sp)       # save ra on the stack

    # ... save any caller-saved regs needed across calls ...
    call bar            # ra is now bar's return addr; our saved ra is safe
    # ... restore caller-saved regs ...

    ld   ra, (sp)       # epilogue: restore ra
    addi sp, sp, 16     # deallocate
    ret                 # PC = ra (back to foo's caller)

Stack Frame Picture¶

The handwritten notes drew sp pointing at the top of the frame and ra stored at sp after the 16-byte allocation:

Higher addresses
+-----------------------------+  <- sp before prologue (caller's frame above)
|   ... caller's frame ...    |
+-----------------------------+  <- sp after  addi sp, sp, -16
|   ra (saved return address) |     stored by  sd ra, (sp)
|   (padding to reach 16)     |
+-----------------------------+  <- sp - 16 is the bottom of this frame
Lower addresses

Leaf Functions Need No Frame¶

A leaf function calls no other function. It never executes call, so ra is never overwritten and it does not need to save ra or allocate a frame at all. The project guide states this directly: functions that do not call other functions, and do not need to preserve caller-saved registers, do not need stack space.

# Leaf function: max of two ints, no stack frame needed
.global max2_s
max2_s:
    bge  a0, a1, max2_done   # if a0 >= a1, a0 already holds the max
    mv   a0, a1              # otherwise the max is a1
max2_done:
    ret

Choosing the Right Load/Store¶

Memory is byte-addressable, so the instruction must match the data size. The lecture reminded us to pick the right width:

Data size	Store	Load (signed)	Load (unsigned)
1 byte (`char`)	`sb`	`lb`	`lbu`
4 bytes (`int`, word)	`sw`	`lw`	`lwu`
8 bytes (doubleword, 64-bit)	`sd`	`ld`	—

Because registers and addresses are 64-bit, we use sd/ld to save and restore ra, sp-relative pointers, and s registers. We use sw/lw for 32-bit int values.

4. Non-Leaf Functions: The Caller-Saved Approach¶

A non-leaf function calls other functions, so it must (1) save ra, and (2) protect any caller-saved values it needs after each call. The first strategy keeps working values in the a/t registers and saves them on the stack around each call.

Worked Example: `add4f_s`¶

This was the running example in the notes (add4f_s with the stack diagram). It adds four numbers by calling a two-argument adder three times:

add4f(a, b, c, d) = add2( add2(a, b), add2(c, d) )

The C reference:

int add2(int a, int b) {
    return a + b;
}

int add4f(int a, int b, int c, int d) {
    return add2(add2(a, b), add2(c, d));
}

The handwritten stack diagram showed a 32-byte frame holding ra at offset 0 and the saved arguments above it (a2, a3, and a temporary), with sp at the top and sp-32 at the bottom:

        Stack (frame for add4f_s, 32 bytes)
sp     -> +-----------------+
 (24)     |   tmp0 (a + b)  |   sp + 24
 (16)     |   d  (a3)       |   sp + 16
  (8)     |   c  (a2)       |   sp + 8
  (0)     |   ra            |   sp + 0
sp-32  -> +-----------------+

The assembly, following the caller-saved approach:

.global add4f_s
.global add2_s

# Leaf: a0 = a0 + a1
add2_s:
    add  a0, a0, a1
    ret

# a0=a, a1=b, a2=c, a3=d ; returns a+b+c+d in a0
add4f_s:
    addi sp, sp, -32     # allocate a 32-byte frame
    sd   ra, 0(sp)       # save ra (we will call add2_s)

    sd   a2, 8(sp)       # save c (caller-saved; add2_s may clobber a2)
    sd   a3, 16(sp)      # save d

    call add2_s          # a0 = a + b   (a0,a1 already held a,b)
    sd   a0, 24(sp)      # save tmp0 = a + b

    ld   a0, 8(sp)       # a0 = c
    ld   a1, 16(sp)      # a1 = d
    call add2_s          # a0 = c + d

    mv   a1, a0          # a1 = c + d
    ld   a0, 24(sp)      # a0 = a + b
    call add2_s          # a0 = (a+b) + (c+d)

    ld   ra, 0(sp)       # restore ra
    addi sp, sp, 32      # deallocate
    ret

The key idea: before each call, anything in a/t registers that we still need afterward is written to the stack, and reloaded after the call returns, because the callee is free to overwrite those registers.

5. Non-Leaf Functions: The Callee-Saved Approach¶

The lecture showed a second approach for the same problem. Instead of bouncing values to the stack around every call, move them into s registers, which are guaranteed to survive calls. Then save and restore those s registers just once, in the prologue and epilogue.

.global add4f_callee_s

# a0=a, a1=b, a2=c, a3=d ; returns a+b+c+d in a0
# Callee-saved approach: park c, d, and (a+b) in s registers
add4f_callee_s:
    addi sp, sp, -32     # frame for ra + s0 + s1 + s2
    sd   ra, 0(sp)
    sd   s0, 8(sp)       # we are about to use s0..s2, so preserve them
    sd   s1, 16(sp)
    sd   s2, 24(sp)

    mv   s0, a2          # s0 = c  (survives calls)
    mv   s1, a3          # s1 = d  (survives calls)

    call add2_s          # a0 = a + b
    mv   s2, a0          # s2 = a + b (survives calls)

    mv   a0, s0          # a0 = c
    mv   a1, s1          # a1 = d
    call add2_s          # a0 = c + d

    mv   a1, a0          # a1 = c + d
    mv   a0, s2          # a0 = a + b
    call add2_s          # a0 = (a+b) + (c+d)

    ld   ra, 0(sp)       # restore everything
    ld   s0, 8(sp)
    ld   s1, 16(sp)
    ld   s2, 24(sp)
    addi sp, sp, 32
    ret

Comparing the Two Approaches¶

Aspect	Caller-saved	Callee-saved
Where working values live	On the stack	In `s` registers
Save/restore timing	Around each call	Once (prologue/epilogue)
Stack accesses	More (per call)	Fewer
Between calls	`ld`/`sd` to the stack	`mv` to/from `s` regs
Best when	One or two calls	Values survive many calls

Both are correct. The callee-saved approach is usually cleaner when several values must persist across multiple calls; the caller-saved approach is fine for a single call.

6. Recursion in Assembly¶

Recursion Is Nothing Special¶

The project guide makes the key point: a recursive function is just a function that calls itself, so the same rules apply. Save ra, save any caller-saved values you need after the recursive call, make the call, then restore. What makes recursion interesting is that each invocation gets its own stack frame, so each level keeps its own copy of n and its own ra.

Recursion is the natural tool for processing self-similar data such as trees and graphs, where each node leads to sub-problems of the same shape.

Worked Example: Recursive Factorial¶

The C version:

int factrec(int n) {
    if (n <= 0) {
        return 1;        // base case
    } else {
        return n * factrec(n - 1);   // recursive step
    }
}

The assembly:

.global factrec_s

# int factrec_s(int n)   a0 = n
factrec_s:
    addi sp, sp, -16
    sd   ra, 0(sp)          # save our return address

    bgt  a0, zero, factrec_rec   # if n > 0, do the recursive step
    li   a0, 1                   # base case: 0! = 1
    j    factrec_done

factrec_rec:
    sd   a0, 8(sp)          # save THIS frame's n on the stack
    addi a0, a0, -1         # a0 = n - 1
    call factrec_s          # a0 = factrec(n - 1)
    ld   t0, 8(sp)          # restore this frame's n
    mul  a0, a0, t0         # a0 = factrec(n-1) * n

factrec_done:
    ld   ra, 0(sp)          # restore return address
    addi sp, sp, 16
    ret

Two things must be preserved across the recursive call:

ra, because call factrec_s overwrites it (and without it, the recursion would loop forever).
n, because the recursive call overwrites a0 with n - 1. After the call returns, we need the original n to multiply. Each frame stores its own n at sp + 8.

Visualizing the Recursive Expansion¶

The handwritten notes traced factrec(4) "going down" into recursive calls and then multiplying on the way back "up":

factrec(4)
  = 4 * factrec(3)
        = 3 * factrec(2)
              = 2 * factrec(1)
                    = 1 * factrec(0)
                          = 1          <- base case
              <- 1 * 1 = 1
        <- 2 * 1 = 2
  <- 3 * 2 = 6
<- 4 * 6 = 24

The mul operations only happen as the calls return (unwind), because n * factrec(n-1) cannot be computed until factrec(n-1) has produced a value.

Stack Growth: One Frame Per Call¶

The notes drew the stack with a 16-byte frame for each level, each holding its own n and ra. For factrec(4):

Higher addresses
+---------------------------+
|  factrec(4): ra (main)    |  sp4 + 0
|  factrec(4): n = 4        |  sp4 + 8
+---------------------------+  <- sp4
|  factrec(3): ra (factrec) |  sp3 + 0
|  factrec(3): n = 3        |  sp3 + 8
+---------------------------+  <- sp3
|  factrec(2): ra (factrec) |  sp2 + 0
|  factrec(2): n = 2        |  sp2 + 8
+---------------------------+  <- sp2
|  factrec(1): ra (factrec) |  sp1 + 0
|  factrec(1): n = 1        |  sp1 + 8
+---------------------------+  <- sp1
|  factrec(0): ra (factrec) |  sp0 + 0
|  (base case, n not used)  |  sp0 + 8
+---------------------------+  <- sp0  (deepest point)
Lower addresses

Each frame's saved ra points back into the previous level (the notes labeled them "ra (main)", "ra (fact)", ...), forming the chain that lets the recursion unwind correctly. The base-case frame at sp0 still allocates 16 bytes and saves ra, but never reads n.

flowchart TD
    F4["factrec(4): n=4"] --> F3["factrec(3): n=3"]
    F3 --> F2["factrec(2): n=2"]
    F2 --> F1["factrec(1): n=1"]
    F1 --> F0["factrec(0): base case -> 1"]
    F0 -->|"1"| R1["return 1*1 = 1"]
    R1 -->|"1"| R2["return 2*1 = 2"]
    R2 -->|"2"| R3["return 3*2 = 6"]
    R3 -->|"6"| R4["return 4*6 = 24"]
    style F0 fill:#f9f,stroke:#333,stroke-width:2px
    style R4 fill:#9f9,stroke:#333,stroke-width:2px

Execution Trace for `factrec_s(4)`¶

Call	`n`	Saves	Recursive call returns	This call returns
`factrec_s(4)`	4	ra, n=4	`factrec_s(3)` -> 6	4 * 6 = 24
`factrec_s(3)`	3	ra, n=3	`factrec_s(2)` -> 2	3 * 2 = 6
`factrec_s(2)`	2	ra, n=2	`factrec_s(1)` -> 1	2 * 1 = 2
`factrec_s(1)`	1	ra, n=1	`factrec_s(0)` -> 1	1 * 1 = 1
`factrec_s(0)`	0	ra	(base case)	1

7. Tail Recursion¶

If a recursive function does its recursive call last and simply returns that result without further work, it is tail recursive. The project guide notes that in this case the recursive call can be turned into a plain jump (j), reusing the same stack frame instead of growing a new one each time.

factrec as written is not tail recursive: after factrec(n-1) returns, there is still a mul to perform, so the frame must stay alive. We can rewrite factorial with an accumulator to make it tail recursive:

int factacc(int n, int acc) {
    if (n <= 0) return acc;
    return factacc(n - 1, n * acc);   // tail call: nothing happens after it
}

.global factacc_s
# a0 = n, a1 = acc ; call initially with acc = 1
factacc_s:
    bgt  a0, zero, factacc_step
    mv   a0, a1            # base case: return acc
    ret
factacc_step:
    mul  a1, a1, a0        # acc = acc * n
    addi a0, a0, -1        # n = n - 1
    j    factacc_s         # tail call becomes a jump: no new frame, no ra save

Because nothing follows the recursive call, we need neither a stack frame nor a saved ra; the loop runs in constant stack space. This is the assembly-level reason compilers perform "tail-call optimization."

8. Translating Compound Conditionals¶

The lecture also covered translating C conditionals such as a range check into branches. RISC-V has only a handful of branch instructions (beq, bne, blt, bge), so we often invert the condition: instead of branching when the condition is true, we branch away when it is false.

Consider:

if (a <= x && x <= b) {
    // in range
} else {
    // out of range
}

We branch to else when the condition fails. The condition fails if x < a OR x > b:

# a0 = a, a1 = x, a2 = b
    blt  a1, a0, range_else   # x < a  -> out of range
    blt  a2, a1, range_else   # b < x  (i.e. x > b) -> out of range
    # ---- in range ----
    # ... body for true case ...
    j    range_end
range_else:
    # ... body for false case ...
range_end:

Useful pseudo-instruction equivalences for building these tests:

C comparison	RISC-V branch	Notes
`x < y`	`blt x, y, L`	branch if less than
`x <= y`	`bge y, x, L`	swap operands: `y >= x`
`x > y`	`blt y, x, L`	swap operands; `bgt` is a pseudo-instruction for this
`x >= y`	`bge x, y, L`	branch if greater or equal
`x == y`	`beq x, y, L`
`x != y`	`bne x, y, L`

bgt, ble, bgtu, bleu are pseudo-instructions the assembler implements by swapping the operands of blt/bge.

9. Debugging Recursion with GDB¶

The lecture recommended single-stepping through recursive code in GDB and drawing the stack frames. A few commands that help:

# Build with debug symbols, then run under gdb (on the RISC-V VM)
gdb ./fibrec
(gdb) break factrec_s        # stop at the function
(gdb) run 4                  # run with argument
(gdb) stepi                  # step one instruction at a time
(gdb) info registers a0 ra sp   # watch arguments, return address, stack pointer
(gdb) x/4gx $sp              # examine 4 doublewords at the current sp
(gdb) backtrace              # see the chain of recursive frames
(gdb) continue

Watching sp shrink by 16 on each recursive entry and grow back by 16 on each return makes the stack diagram from Section 6 concrete. Inspecting the saved ra at 0(sp) confirms that each level's return address is preserved, which is how you verify you did not introduce the infinite-loop bug.

Key Concepts¶

Concept	Definition	Example
Return address (`ra`)	Register holding the address to resume at after a call	`call bar` sets `ra = PC + 4`
`call`	Pseudo-instruction: save `PC+4` in `ra`, jump to target	`call add2_s`
`ret`	Pseudo-instruction: set `PC = ra`	`ret` at end of a function
Caller-saved registers	`a0`–`a7`, `t0`–`t6`, `ra`; may be clobbered by a callee	Save `a2` before a call if needed after
Callee-saved registers	`s0`–`s11`, `sp`; must be preserved by the callee	Save `s0` in prologue, restore in epilogue
Prologue / epilogue	Entry code that allocates+saves / exit code that restores+frees	`addi sp,sp,-16; sd ra,(sp)` ... `ld ra,(sp); addi sp,sp,16; ret`
Stack alignment	`sp` must always be a multiple of 16	Allocate 16 even to save only `ra`
Leaf function	Calls no other function; needs no frame	`max2_s`
Non-leaf function	Calls another function; must save `ra`	`add4f_s`
Recursion	Function that calls itself; one frame per call	`factrec_s(4)`
Tail recursion	Recursive call is the last action; becomes a jump	`factacc_s`

Practice Problems¶

Problem 1: Call/Return Mechanics¶

This code lives at address 0x4000:

0x4000: call helper      # helper is at 0x5000
0x4004: mul a0, a0, a1

After call helper executes, what are ra and PC?

Click to reveal solution

- `ra = 0x4004` (the address of the instruction immediately after the `call`) - `PC = 0x5000` (the address of `helper`) `call` expands to `jal ra, helper`: it stores `PC + 4 = 0x4004` into `ra`, then jumps so `PC = 0x5000`. When `helper` runs `ret`, `PC` is set back to `0x4004`.

Problem 2: What Must Be Saved?¶

A function holds a value it needs in a3, uses s1 for a loop counter, and calls another function in the middle. Which registers must be saved, by whom, and where?

Click to reveal solution

- **`ra`**: caller-saved, and this is a non-leaf function, so `call` will overwrite it. Save `ra` in the prologue, restore it in the epilogue. - **`a3`**: caller-saved. The callee may clobber `a3`. Since we need it after the call, **we** must save `a3` on the stack before the call and reload it after. - **`s1`**: callee-saved. Because we modify `s1`, we must save it in our prologue and restore it in our epilogue so our own caller sees it unchanged. `s1` will automatically survive the call we make (the callee is also required to preserve it). So the prologue saves `ra` and `s1`; `a3` is saved on the stack only around the call.

Problem 3: Frame Size¶

A function must save ra, s0, s1, s2, and one 4-byte int local. What frame size should it allocate, and where does each value go?

Click to reveal solution

Bytes needed: `ra` (8) + `s0` (8) + `s1` (8) + `s2` (8) + int (4) = **36 bytes**. Round up to the next multiple of 16: **48 bytes**.

addi sp, sp, -48
sd   ra, 0(sp)
sd   s0, 8(sp)
sd   s1, 16(sp)
sd   s2, 24(sp)
sw   t0, 32(sp)     # 4-byte int local
# bytes 36..47 are alignment padding

Problem 4: Find the Bug¶

.global compute
compute:
    addi sp, sp, -16
    sd   s0, 0(sp)
    mv   s0, a0
    call helper
    add  a0, a0, s0
    ld   s0, 0(sp)
    addi sp, sp, 16
    ret

What is wrong, and what symptom does it produce?

Click to reveal solution

`ra` is never saved. Because `compute` calls `helper`, the `call` overwrites `ra` with the address inside `compute`. When `compute` runs `ret`, it jumps back into itself (to the instruction after `call helper`) instead of returning to its caller — an **infinite loop**. Fix: save and restore `ra` (and keep `s0` saved too):

.global compute
compute:
    addi sp, sp, -16
    sd   ra, 0(sp)      # save ra
    sd   s0, 8(sp)      # save s0
    mv   s0, a0
    call helper
    add  a0, a0, s0
    ld   s0, 8(sp)
    ld   ra, 0(sp)      # restore ra
    addi sp, sp, 16
    ret

Problem 5: Recursive Fibonacci¶

Project 2's fibrec computes Fibonacci numbers recursively: fib(0)=0, fib(1)=1, fib(n)=fib(n-1)+fib(n-2). Write fibrec_s. This function makes two recursive calls, so think carefully about what must survive each call.

Click to reveal solution

.global fibrec_s
# int fibrec_s(int n)   a0 = n
fibrec_s:
    addi sp, sp, -16
    sd   ra, 0(sp)          # we call ourselves, so save ra

    li   t0, 2
    bge  a0, t0, fib_rec     # if n >= 2, recurse; else base case
    # base case: fib(0)=0, fib(1)=1  -> just return n (0 or 1)
    j    fib_done

fib_rec:
    sd   a0, 8(sp)          # save n
    addi a0, a0, -1         # a0 = n - 1
    call fibrec_s           # a0 = fib(n-1)

    ld   t0, 8(sp)          # restore n
    sd   a0, 8(sp)          # reuse slot: save fib(n-1)
    addi a0, t0, -2         # a0 = n - 2
    call fibrec_s           # a0 = fib(n-2)

    ld   t0, 8(sp)          # t0 = fib(n-1)
    add  a0, a0, t0         # a0 = fib(n-2) + fib(n-1)

fib_done:
    ld   ra, 0(sp)
    addi sp, sp, 16
    ret

The trick is that after the first recursive call we no longer need the original `n` (we already computed `n-1`), so we can reuse the `8(sp)` slot to hold `fib(n-1)` while we compute `fib(n-2)`. Then we add the two results. `fibrec_s(10)` returns 55, and `fibrec_s(20)` returns 6765, matching the project examples.

Problem 6: Range Check¶

Translate this C into RISC-V assembly. Assume a0 = x, a1 = lo, a2 = hi, and the result r goes in a0.

int r;
if (lo <= x && x <= hi) {
    r = 1;
} else {
    r = 0;
}

Click to reveal solution

Branch to the `else` path whenever the range test fails. It fails if `x < lo` OR `x > hi`:

# a0 = x, a1 = lo, a2 = hi ; result in a0
    blt  a0, a1, out_of_range   # x < lo  -> fail
    blt  a2, a0, out_of_range   # hi < x  (x > hi) -> fail
    li   a0, 1                  # in range
    j    range_end
out_of_range:
    li   a0, 0
range_end:
    ret

Note `blt a2, a0` tests `hi < x`, which is the same as `x > hi`. This is the "inverse comparison" pattern: because RISC-V lacks a direct `x > hi` branch on these operands without swapping, we branch when the condition is false and fall through when it is true.

Summary¶

Call and return rely on ra: call saves PC+4 into ra and jumps; ret sets PC = ra. Because call overwrites ra, any function that calls another must save ra first, or it will loop forever.
The calling convention splits the 32 registers into caller-saved (a0–a7, t0–t6, ra) and callee-saved (s0–s11, sp). Caller-saved registers can be clobbered by a callee; callee-saved registers must be preserved by the function that uses them.
Preserve only what you need, but remember you must protect a register if a function you call might modify it, even when your own code does not.
The prologue/epilogue pattern allocates a 16-byte-aligned stack frame, saves ra and any s registers used, runs the body, then restores and deallocates before ret. Leaf functions need no frame at all.
Two strategies for non-leaf functions: the caller-saved approach stores a/t values on the stack around each call; the callee-saved approach parks them in s registers and saves those once. The callee-saved approach is cleaner when values must survive many calls.
Recursion follows the same rules — it is just a function calling itself. Each call gets its own stack frame holding its own ra and its own copy of n, so the recursive factorial multiplies on the way back up as the frames unwind.
Tail recursion, where the recursive call is the last action, can be rewritten as a jump that reuses one frame, running in constant stack space.
Compound conditionals like lo <= x && x <= hi are best translated with inverse comparisons: branch away from the body when any part of the condition is false, and fall through when all parts hold.

RISC-V Assembly Part 5: Functions and Recursion¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. The Call/Return Mechanism¶

Why Functions Need Hardware Support¶

call and ret¶

How ra Links Caller and Callee¶

The Danger: ra Is Just a Register¶

2. Caller-Saved vs Callee-Saved Registers¶

Caller-Saved (Temporary) Registers¶

Callee-Saved (Saved) Registers¶

Summary Table¶

A Subtle Rule: Preserve What Your Callees Might Touch¶

Argument Passing and Return Values¶

3. The Stack and the Prologue/Epilogue Pattern¶

Stack Direction and Alignment¶

The Pattern¶

Stack Frame Picture¶

Leaf Functions Need No Frame¶

Choosing the Right Load/Store¶

4. Non-Leaf Functions: The Caller-Saved Approach¶

Worked Example: add4f_s¶

5. Non-Leaf Functions: The Callee-Saved Approach¶

Comparing the Two Approaches¶

6. Recursion in Assembly¶

Recursion Is Nothing Special¶

Worked Example: Recursive Factorial¶

Visualizing the Recursive Expansion¶

Stack Growth: One Frame Per Call¶

Execution Trace for factrec_s(4)¶

7. Tail Recursion¶

8. Translating Compound Conditionals¶

9. Debugging Recursion with GDB¶

Key Concepts¶

Practice Problems¶

Problem 1: Call/Return Mechanics¶

Problem 2: What Must Be Saved?¶

Problem 3: Frame Size¶

Problem 4: Find the Bug¶

Problem 5: Recursive Fibonacci¶

Problem 6: Range Check¶

Further Reading¶

Summary¶

`call` and `ret`¶

How `ra` Links Caller and Callee¶

The Danger: `ra` Is Just a Register¶

Worked Example: `add4f_s`¶

Execution Trace for `factrec_s(4)`¶