RISC-V Assembly Part 3: Arrays and Functions¶
Overview¶
This lecture connects two ideas that dominate real assembly programming: arrays and functions. We first show that C array indexing (arr[i]) and pointer arithmetic (*(arr + i)) are exactly the same operation, then translate that operation into RISC-V using the offset + base address pattern with lw and sw. We then formalize simple (leaf) functions: how arguments arrive in the a registers, how a result leaves in a0, the eight-argument limit, and the rule that a leaf function may freely use the a and t registers without touching the stack. These are the foundations for Lab 3 (sum_array, find_max) and Project 2.
Learning Objectives¶
- Explain why
arr[i]in C is exactly equivalent to*(arr + i)(pointer arithmetic). - Describe how C scales pointer arithmetic by the element size automatically.
- Compute a byte offset (
index * element_size) and add it to a base address to reach an array element in assembly. - Use
lwto read an array element andswto write an array element, and state the direction of data flow for each. - Translate a C array-access function into a correct RISC-V leaf function.
- Identify a leaf function and state why it needs no stack frame.
- Apply the RISC-V argument convention: arguments in
a0–a7(eight or fewer), return value ina0. - Use a temporary register to hold an intermediate address and avoid clobbering arguments prematurely.
Prerequisites¶
- RISC-V Assembly Parts 1 and 2: registers, instructions, immediates, control flow with branches and labels.
- C basics: arrays, pointers, the
&(address-of) and*(dereference) operators. - Memory model: byte-addressable memory, a 32-bit
int(word) occupies 4 bytes, 64-bit registers. - Familiarity with
lw/swandli/mul/addfrom the RISC-V reference guide.
1. Roadmap and Logistics¶
Today's session opened with three items written at the top of the notes:
- Arrays and Functions — the conceptual focus.
- Lab 3 Q & A — Lab 3 (
quadratic,min,sum_array,find_max) is due tonight. - Local VM setup — getting a RISC-V development environment running locally with QEMU and UTM so you are not dependent on the lab machines.
The technical heart of the lecture is two topics that build directly on the branches and loops from Part 2:
- Arrays — how to reach an element of an array in memory.
- Functions — the convention for passing arguments and returning a value.
Both topics feed directly into Lab 3's array problems and into Project 2, which adds real multi-function calling conventions on top of what we cover here.
2. Arrays and Pointers in C¶
The Two C Operators You Need¶
The notes call out two operators as hints for everything that follows:
&x— the address-of operator. It produces a pointer: the memory address wherexlives.*p— the dereference operator. Given a pointerp, it accesses the value stored at that address.
An array name like arr already is an address — specifically the address of element 0. That single fact is what makes array indexing and pointer arithmetic interchangeable.
Declaring an Array¶
This reserves storage for five int values laid out contiguously in memory. Because an int is 4 bytes, the array occupies 20 bytes total, and the elements sit at consecutive addresses:
address: arr+0 arr+4 arr+8 arr+12 arr+16
+-------+-------+-------+-------+-------+
element: |arr[0] |arr[1] |arr[2] |arr[3] |arr[4] |
+-------+-------+-------+-------+-------+
index: 0 1 2 3 4
Notice the addresses jump by 4 each time, not by 1. The index counts elements; the address counts bytes. Converting between the two is the entire trick of array access in assembly.
Index Notation vs. Pointer Notation¶
The lecture put the two notations side by side. Every line on the left is exactly equivalent to the line on the right:
| Array (index) notation | Pointer notation | Meaning |
|---|---|---|
arr[0] = 99; |
*arr = 99; |
store 99 into element 0 |
arr[1] = 100; |
*(arr + 1) = 100; |
store 100 into element 1 |
x = arr[0]; |
x = *arr; |
load element 0 into x |
y = arr[1]; |
y = *(arr + 1); |
load element 1 into y |
The general identity, for any index i, is:
This is not an approximation — the C standard defines arr[i] to mean *(arr + i). The two forms compile to the same machine code.
Pointer Arithmetic Scales by Element Size¶
Here is the part that trips everyone up the first time. When you write arr + 1 in C, you do not add 1 to the address. C scales the + 1 by the size of the element type. For int *, the element size is 4 bytes, so:
arr + 1 means (address of arr) + 1 * sizeof(int) = address + 4
arr + 2 means (address of arr) + 2 * sizeof(int) = address + 8
arr + i means (address of arr) + i * sizeof(int) = address + (i * 4)
C does this scaling silently because it knows the type of arr. Assembly does not know any types — to the processor, an address is just a number. That means we are responsible for the multiply. This is exactly the work the C compiler does for us, and it is what we now have to write by hand.
flowchart LR
A["arr + i (C)"] --> B["compiler knows<br/>sizeof(int) = 4"]
B --> C["address + (i * 4)"]
C --> D["machine address"]
style B fill:#f9f,stroke:#333,stroke-width:2px
Challenge from class: take the
arr[i]version of a function, rewrite it using only pointer arithmetic (*(arr + i)), and then translate that pointer version to assembly. The pointer form maps almost one-to-one onto the instructions, so doing this rewrite first makes the assembly far easier.
3. Array Access in Assembly: Offset + Base¶
The Question¶
The lecture framed array access around one small function. Imagine main has a local array int arr[3] living on the stack, and we want to evaluate:
How does the processor find the right element when all it has is the base address of the array and the index i?
The Picture: Offset + arr¶
The handwritten diagram shows the three-element array on the stack and how an index maps to a memory location:
STACK
+---------+
arr+8 -> | arr[2] | index 2
+---------+
arr+4 -> | arr[1] | index 1
+---------+
arr+0 -> | arr[0] | index 0
+---------+
^
|
target = offset + arr (the base, held in a0)
The element we want lives at:
That is the whole idea, highlighted in red in the notes: offset + arr. Compute the byte offset from the index, add it to the base address, and you have a pointer to the element. Then dereference it.
The C Function (Pointer Form)¶
Following the challenge, we write the access in pointer form so it maps cleanly to assembly:
// Return arr[i] using pointer arithmetic.
// a0 will hold arr (the base address), a1 will hold i.
int arr_get_c(int arr[], int i) {
return *(arr + i); // same as: return arr[i];
}
The Assembly Function¶
We are told how the arguments arrive — this register-to-parameter mapping is exactly what the calling convention guarantees:
Here is the translation. The comments restate what each register holds at each step:
.global arr_get_s
arr_get_s:
li t0, 4 # t0 = 4 (bytes per int = element size)
mul t1, a1, t0 # t1 = i * 4 (the byte offset)
add t2, a0, t1 # t2 = arr + offset (address of arr[i])
lw a0, (t2) # a0 = *t2 = arr[i] (load the element)
ret # return value is in a0
Step by step, mirroring the red annotations from the notes:
li t0, 4— put the element size (4 bytes perint) into a temporary.mul t1, a1, t0— multiply the indexi(a1) by 4 to get the byte offset. The note labels thist1 = t0(4) * a1(i).add t2, a0, t1— add the base addressarr(a0) to the offset. The note labels thist2 = a0(arr) + t1(offset). Nowt2is a pointer toarr[i].lw a0, (t2)— load the word at that address intoa0. This is the dereference*(arr + i).ret— the result is already ina0, the return register.
Why lw a0, (t2) and not lw a0, t2¶
The parentheses matter. lw a0, (t2) means "treat the value in t2 as a memory address and load the word stored there." Without the parentheses you would be referring to the register's contents as data, which is not what a load does. The form is:
When the offset is zero we write it as lw rd, (rbase), which is shorthand for lw rd, 0(rbase).
A Note on mul vs. Shift¶
Multiplying by 4 is the same as shifting left by 2 bits, because 4 = 2². Many assembly programmers prefer the shift because it avoids the mul instruction:
slli t1, a1, 2 # t1 = i << 2 = i * 4 (offset)
add t2, a0, t1 # t2 = arr + offset
lw a0, (t2) # a0 = arr[i]
ret
Both are correct. The lecture's emphasis was on writing clear, correct code over clever optimization — pick whichever you understand best and use it consistently.
flowchart TD
A["i (in a1)"] --> B["offset = i * 4"]
C["arr base (in a0)"] --> D["addr = arr + offset"]
B --> D
D --> E["lw: value = memory at addr"]
E --> F["result in a0"]
4. Loading vs. Storing: The Direction of Data Flow¶
Reading an element uses lw; writing an element uses sw. The lecture stressed getting the direction right, because the two instructions look almost identical but move data opposite ways.
| Instruction | Form | Meaning | Direction |
|---|---|---|---|
lw |
lw rd, (raddr) |
rd = memory[raddr] |
memory → register |
sw |
sw rsrc, (raddr) |
memory[raddr] = rsrc |
register → memory |
The trap is that in both instructions the register operand is written first, even though for sw that register is the source, not the destination. Read it as:
lw a0, (t2)— "load intoa0from the address int2."sw a0, (t2)— "storea0into the address int2."
Writing an Array Element¶
To implement arr[i] = x, compute the same address, then store instead of load:
# a0 - int arr[] (base address)
# a1 - int i (index)
# a2 - int x (value to store)
.global arr_set_s
arr_set_s:
li t0, 4 # element size
mul t1, a1, t0 # t1 = i * 4 (offset)
add t2, a0, t1 # t2 = arr + offset (address of arr[i])
sw a2, (t2) # memory[t2] = x (store the value)
ret
Notice the only differences from arr_get_s: the value to store comes in a third argument a2, and the final instruction is sw a2, (t2) instead of lw a0, (t2). There is no return value, so we do not touch a0.
5. Walking an Array in a Loop¶
A single element access is the building block; most array work is a loop over all elements. Lab 3's sum_array is the canonical example. Here is the C:
int sum_array(int arr[], int n) {
int sum = 0;
for (int i = 0; i < n; i++) {
sum = sum + arr[i];
}
return sum;
}
There are two common ways to translate the body. Both are correct; the second is slightly cleaner because it advances a pointer instead of recomputing the address each iteration.
Version A: Index-Based (recompute address each iteration)¶
# a0 - int arr[] (base address)
# a1 - int n (number of elements)
.global sum_array_s
sum_array_s:
li t0, 0 # t0 = sum = 0
li t1, 0 # t1 = i = 0
loop:
bge t1, a1, done # if i >= n, exit loop
slli t2, t1, 2 # t2 = i * 4 (byte offset)
add t2, a0, t2 # t2 = arr + offset (address of arr[i])
lw t3, (t2) # t3 = arr[i]
add t0, t0, t3 # sum = sum + arr[i]
addi t1, t1, 1 # i = i + 1
j loop
done:
mv a0, t0 # return value = sum
ret
Version B: Pointer-Walking (advance the pointer by 4 each iteration)¶
Instead of multiplying every time, keep a running pointer and bump it by the element size:
# a0 - int arr[] (base address, used as a moving pointer)
# a1 - int n (number of elements)
.global sum_array_s
sum_array_s:
li t0, 0 # t0 = sum = 0
li t1, 0 # t1 = i = 0
loop:
bge t1, a1, done # if i >= n, exit loop
lw t2, (a0) # t2 = *arr (current element)
add t0, t0, t2 # sum = sum + current element
addi a0, a0, 4 # arr = arr + 1 (advance one int = 4 bytes)
addi t1, t1, 1 # i = i + 1
j loop
done:
mv a0, t0 # return value = sum
ret
Version B is the assembly form of *(arr++) and shows why pointer arithmetic is so natural here: advancing to the next int is just addi a0, a0, 4. The note from Part 2 captured this directly: "To get to next element in array of words (ints): addi a0, a0, 4."
flowchart TD
Start["sum = 0, i = 0"] --> Check{"i < n?"}
Check -- "no" --> Done["return sum in a0"]
Check -- "yes" --> Load["load arr[i]"]
Load --> Add["sum += arr[i]"]
Add --> Bump["i++ and advance pointer"]
Bump --> Check
6. Simple (Leaf) Functions¶
The second half of the lecture introduced simple functions, the kind you have been writing in Lab 3.
The Argument and Return Convention¶
The notes record the calling convention plainly:
- The first argument is in
a0, the second ina1, and so on up througha7. - A function may take eight arguments or fewer this way. (Beyond eight, additional arguments spill to the stack — out of scope for a simple function.)
- The single return value comes back in
a0.
Because the return value also lands in a0, a function will often overwrite its own first argument. In arr_get_s above, a0 started as the array base and ended as the loaded element. That is fine inside the function, but it means the caller cannot assume a0 survives a call — if the caller still needs the original a0, it must save it first. (That caller-side saving is the subject of Project 2's full calling convention.)
What Is a Leaf Function?¶
The notes define a simple function with a labeled, braced block and three rules pointing at it:
A leaf function is a function that does not call any other function. Picture the call graph as a tree: a function that calls others has children; a function that calls nobody is a leaf of that tree.
flowchart TD
main["main"] --> f["foo (non-leaf)"]
f --> g["arr_get_s (leaf)"]
f --> h["min_s (leaf)"]
style g fill:#9f9,stroke:#333
style h fill:#9f9,stroke:#333
The green nodes are leaves: they sit at the bottom of the call tree and make no further calls.
The Two Rules for a Leaf Function¶
-
Use only
aandtregisters. These are the caller-saved (also called temporary or "volatile") registers —a0–a7andt0–t6. A function is allowed to clobber them freely. Because a leaf function uses only these, it never disturbs any value the caller is responsible for keeping. -
No calls to other functions. This is what makes it a leaf. And it is why a leaf function is so simple: the only thing that would overwrite the return-address register
rais acall, and a leaf makes none — sorais never disturbed, andretworks without any saving.
Why a Leaf Needs No Stack Frame¶
Putting the two rules together yields the payoff: a leaf function does not need to allocate any stack space.
- It never calls another function, so
ra(the return address) is never overwritten — no need to save and restore it. - It uses only caller-saved (
a/t) registers, so there is nothing the callee is obligated to preserve.
The result is the clean shape you have seen in every Lab 3 program: a label, some computation in a/t registers, and ret. No addi sp, sp, -N prologue, no ld ra epilogue. Every Lab 3 function — quadratic, min, sum_array, find_max — is a leaf function.
# The leaf-function shape:
.global func_s
func_s:
# ... compute using only a* and t* registers ...
# ... no 'call' instructions ...
ret # ra was never touched, so this just works
Looking ahead: Project 2 introduces non-leaf functions (e.g.
findmaxfccallsmax2_s,sortcallsswap_s). A non-leaf function must allocate a stack frame and saverabefore it calls, becausecalloverwritesra. The contrast with leaf functions is exactly why we draw the line so carefully here.
7. Debugging Array and Function Code with GDB¶
The lecture demonstrated GDB for stepping through array access, since a wrong offset or a load/store mix-up is invisible from the program's printed output alone. A minimal workflow:
Inside GDB, the most useful commands for this material:
break find_max_s # stop at the start of our function
run 1 2 99 3 4 # run with command-line args
info registers a0 a1 # inspect the argument registers
stepi # execute one machine instruction
x/4xw $a0 # examine 4 words, in hex, at the address in a0
The x (examine) command is the key tool for arrays. x/4xw $a0 reads "examine 4 values, format hex, size word, starting at the address held in a0" — letting you see the array contents directly in memory even when they never get printed. Use it after computing an offset to confirm t2 really points at the element you intended.
Key Concepts¶
| Concept | Definition | Example |
|---|---|---|
| Array | Contiguous block of same-type elements in memory | int arr[5]; → 20 bytes |
| Base address | Address of element 0; what an array name evaluates to | arr in a0 |
| Index vs. byte offset | Index counts elements; offset counts bytes | arr[3] → offset 3 * 4 = 12 |
arr[i] ≡ *(arr + i) |
Indexing is defined as scaled pointer arithmetic | arr[2] = *(arr + 2) |
| Pointer scaling | C multiplies + i by sizeof(element) automatically |
int *: +1 adds 4 bytes |
lw |
Load word: memory → register | lw a0, (t2) reads arr[i] |
sw |
Store word: register → memory | sw a2, (t2) writes arr[i] |
| Offset + base | The address formula for element access | addr = arr + i*4 |
| Argument registers | First 8 args in a0–a7; return in a0 |
a0 = arr, a1 = i |
| Leaf function | Function that calls no other function | arr_get_s, min_s |
| Caller-saved registers | a0–a7, t0–t6, ra; free to clobber in a leaf |
use t0–t3 freely |
Practice Problems¶
Problem 1: Index to Offset and Address¶
An int array's base address (in a0) is 0x1000. What byte offset and what absolute address correspond to arr[4]? Write the two RISC-V instructions that compute the address of arr[4] into t2.
Click to reveal solution
Each `int` is 4 bytes, so the offset for index 4 is `4 * 4 = 16` bytes, and the address is `0x1000 + 16 = 0x1010`. Or with an explicit multiply:Problem 2: Index Form ⇒ Pointer Form¶
Rewrite each statement using only pointer notation (* and pointer arithmetic), with no [].
Click to reveal solution
Each `arr[k]` is literally defined as `*(arr + k)`. Doing this rewrite first makes the assembly translation almost mechanical.Problem 3: Spot the Load/Store Bug¶
A student wants to store the value in a2 into arr[i], but the autograder reports the array is unchanged. Here is their code. Find and fix the bug.
Click to reveal solution
The instruction is `lw` (load), which copies memory **into** `a2` — the wrong direction. To store, use `sw`, which copies the register **into** memory: Remember the direction: `lw` is memory → register, `sw` is register → memory. The register is always written first in the syntax, even when it is the source.Problem 4: Translate get_second¶
Translate this leaf function to RISC-V assembly. It returns the element at index 1 of the array.
Click to reveal solution
Index 1 has a byte offset of `1 * 4 = 4`, so we can fold the offset directly into the `lw` instruction: The form `lw a0, 4(a0)` uses the built-in immediate offset of the load instruction, so no separate `add` is needed. A longer but equally correct version:Problem 5: Is It a Leaf?¶
For each function, state whether it is a leaf function and whether it needs to allocate a stack frame.
- (a)
min_s— comparesa0anda1withbltand returns the smaller ina0. - (b)
findmaxfc_s— loops over an array and callsmax2_sto compare each pair. - (c)
sum_array_s— loops over an array accumulating a sum, no calls.
Click to reveal solution
- **(a) `min_s`** — Leaf (no `call`). No stack frame needed; uses only `a`/`t` registers and `ret` works because `ra` is untouched. - **(b) `findmaxfc_s`** — **Not** a leaf: it calls `max2_s`. It **must** allocate a stack frame and save `ra` before the call, because `call` overwrites `ra`. (This is Project 2 territory.) - **(c) `sum_array_s`** — Leaf (no `call`). No stack frame needed. The deciding question is always: *does this function execute a `call` instruction?* If yes, it is non-leaf and must preserve `ra`.Problem 6: Trace arr_get_s¶
Trace arr_get_s for arr = {10, 20, 30} at base address 0x2000 with i = 2. Show the value of each register after each instruction, and the final return value.
Click to reveal solution
Start: `a0 = 0x2000` (base), `a1 = 2` (index). | Instruction | Result | |-------------|--------| | `li t0, 4` | `t0 = 4` | | `mul t1, a1, t0` | `t1 = 2 * 4 = 8` (offset) | | `add t2, a0, t1` | `t2 = 0x2000 + 8 = 0x2008` (address of `arr[2]`) | | `lw a0, (t2)` | `a0 = memory[0x2008] = 30` | | `ret` | returns `a0 = 30` | The function returns **30**, which is `arr[2]`. Note that `a0` was overwritten: it started as the base address `0x2000` and ended as the loaded value `30`. The caller must not rely on `a0` still holding the base after the call.Further Reading¶
- RISC-V Reference Guide — instructions, registers, array access, and calling conventions used in this course.
- Key Concepts — per-assignment concept summaries, including array access and the function calling convention.
- Lab 3 Assignment —
quadratic,min,sum_array,find_max. - Project 2 Assignment — non-leaf functions, recursion, and the full calling convention.
- Handwritten lecture notes (PDF)
- RISC-V Cheat Sheet (PDF)
Summary¶
-
arr[i]is defined as*(arr + i)— array indexing and pointer arithmetic are the same operation, so rewriting in pointer form first makes assembly translation straightforward. -
C scales pointer arithmetic by element size automatically; assembly does not, so we compute the byte offset
i * 4ourselves for anintarray. -
Array access is offset + base:
address = arr + (i * 4). Compute the offset, add the base, then dereference. -
lwloads (memory → register) andswstores (register → memory) — the register is written first in both, even when it is the source, so always check the direction. -
Arguments arrive in
a0–a7(eight or fewer) and the return value leaves ina0; because the return reusesa0, callers cannot assumea0survives a call. -
A leaf function calls no other function, uses only caller-saved
a/tregisters, and therefore needs no stack frame — just compute andret. -
Use a temporary register for the computed address (e.g.
t2) so you can dereference it cleanly, and favor clear, correct code over premature optimization. -
GDB's
stepiandx/4xw $a0let you watch offsets and loads directly in memory, which is the fastest way to catch a wrong offset or a load/store mix-up.